php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79700 wrong use of libxml oldNs leads to performance problem
Submitted: 2020-06-15 01:06 UTC Modified: -
From: beberlei@php.net Assigned:
Status: Closed Package: DOM XML related
PHP Version: 7.4.7 OS:
Private report: No CVE-ID: None
 [2020-06-15 01:06 UTC] beberlei@php.net
Description:
------------
In libxml, the xmlDoc->oldNs pointer is used by xmlSearchNs* functions for holding the xmlns:xml namespace "http://www.w3.org/XML/1998/namespace". 

https://github.com/GNOME/libxml2/blob/mainline/include/libxml/tree.h#L140
https://github.com/GNOME/libxml2/blob/mainline/tree.c#L5992

The specification says this xml namespace MAY but need not be declared on a document, but still exists implicitly.

This is what the xmlDoc->oldNs property is for, holding this namespace outside the nodes. Usually a namespace is attached to a node.

PHP Dom however uses xmlDoc->oldNs to store any namespaces of new namespaced elements or attributes during reconciliation. In combination with a list append that does not check for duplicates this causes exponential performance degradation in the number of nodes.

Test script:
---------------
<?php

$dom = new DOMDocument();
$root = $dom->createElementNS('http://www.w3.org/2000/xhtml', 'html');
$dom->appendChild($root);

$s = microtime(true);
for ($i = 0; $i < 10000; $i++) {
    $element = $dom->createElementNS('http://www.w3.org/2000/xhtml', 'p', 'Hello World');
    $root->appendChild($element);
}
echo number_format(microtime(true) - $s, 6) . "\n";

$s = microtime(true);
for ($i = 0; $i < 10000; $i++) {
    $element = $dom->createElementNS('http://www.w3.org/2000/xhtml', 'p', 'Hello World');
    $root->appendChild($element);
}
echo number_format(microtime(true) - $s, 6) . "\n";

$s = microtime(true);
for ($i = 0; $i < 10000; $i++) {
    $element = $dom->createElementNS('http://www.w3.org/2000/xhtml', 'p', 'Hello World');
    $root->appendChild($element);
}
echo number_format(microtime(true) - $s, 6) . "\n";


Expected result:
----------------
Roughly the same time for each block of 10000 nodes.

Actual result:
--------------
Time increases massively for each consecutive block from 200 ms to 1200ms to 2400ms (on my machine).

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-06-15 12:50 UTC] cmb@php.net
The following pull request has been associated:

Patch Name: Fix #79700: Bad performance with namespaced nodes due to wrong libxml assumption
On GitHub:  https://github.com/php/php-src/pull/5719
Patch:      https://github.com/php/php-src/pull/5719.patch
 [2023-06-08 17:44 UTC] git@php.net
Automatic comment on behalf of nielsdos
Revision: https://github.com/php/php-src/commit/a38e3c999ffd1d965692898901a449d511c56381
Log: Fix #79700: Bad performance with namespaced nodes due to wrong libxml assumption
 [2023-06-08 17:44 UTC] git@php.net
-Status: Open +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 05 16:01:30 2024 UTC