php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79700 wrong use of libxml oldNs leads to performance problem
Submitted: 2020-06-15 01:06 UTC Modified: -
From: beberlei@php.net Assigned:
Status: Closed Package: DOM XML related
PHP Version: 7.4.7 OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: beberlei@php.net
New email:
PHP Version: OS:

 

 [2020-06-15 01:06 UTC] beberlei@php.net
Description:
------------
In libxml, the xmlDoc->oldNs pointer is used by xmlSearchNs* functions for holding the xmlns:xml namespace "http://www.w3.org/XML/1998/namespace". 

https://github.com/GNOME/libxml2/blob/mainline/include/libxml/tree.h#L140
https://github.com/GNOME/libxml2/blob/mainline/tree.c#L5992

The specification says this xml namespace MAY but need not be declared on a document, but still exists implicitly.

This is what the xmlDoc->oldNs property is for, holding this namespace outside the nodes. Usually a namespace is attached to a node.

PHP Dom however uses xmlDoc->oldNs to store any namespaces of new namespaced elements or attributes during reconciliation. In combination with a list append that does not check for duplicates this causes exponential performance degradation in the number of nodes.

Test script:
---------------
<?php

$dom = new DOMDocument();
$root = $dom->createElementNS('http://www.w3.org/2000/xhtml', 'html');
$dom->appendChild($root);

$s = microtime(true);
for ($i = 0; $i < 10000; $i++) {
    $element = $dom->createElementNS('http://www.w3.org/2000/xhtml', 'p', 'Hello World');
    $root->appendChild($element);
}
echo number_format(microtime(true) - $s, 6) . "\n";

$s = microtime(true);
for ($i = 0; $i < 10000; $i++) {
    $element = $dom->createElementNS('http://www.w3.org/2000/xhtml', 'p', 'Hello World');
    $root->appendChild($element);
}
echo number_format(microtime(true) - $s, 6) . "\n";

$s = microtime(true);
for ($i = 0; $i < 10000; $i++) {
    $element = $dom->createElementNS('http://www.w3.org/2000/xhtml', 'p', 'Hello World');
    $root->appendChild($element);
}
echo number_format(microtime(true) - $s, 6) . "\n";


Expected result:
----------------
Roughly the same time for each block of 10000 nodes.

Actual result:
--------------
Time increases massively for each consecutive block from 200 ms to 1200ms to 2400ms (on my machine).

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-06-15 12:50 UTC] cmb@php.net
The following pull request has been associated:

Patch Name: Fix #79700: Bad performance with namespaced nodes due to wrong libxml assumption
On GitHub:  https://github.com/php/php-src/pull/5719
Patch:      https://github.com/php/php-src/pull/5719.patch
 [2023-06-08 17:44 UTC] git@php.net
Automatic comment on behalf of nielsdos
Revision: https://github.com/php/php-src/commit/a38e3c999ffd1d965692898901a449d511c56381
Log: Fix #79700: Bad performance with namespaced nodes due to wrong libxml assumption
 [2023-06-08 17:44 UTC] git@php.net
-Status: Open +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 12:01:29 2024 UTC