php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #53655 Improve speed of DOMNode::C14N() on large XML documents
Submitted: 2011-01-05 08:33 UTC Modified: 2023-09-23 19:01 UTC
Votes:21
Avg. Score:4.6 ± 0.7
Reproduced:17 of 17 (100.0%)
Same Version:4 (23.5%)
Same OS:7 (41.2%)
From: olav dot morken at uninett dot no Assigned: nielsdos (profile)
Status: Closed Package: DOM XML related
PHP Version: 5.3.4 OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: olav dot morken at uninett dot no
New email:
PHP Version: OS:

 

 [2011-01-05 08:33 UTC] olav dot morken at uninett dot no
Description:
------------
The C14N() function appears to have a runtime that is O(N^2) (or possibly worse?) depending on input size, which means that it becomes very slow as the input grows. For example, an input with around 196000 nodes takes about 290 seconds, while an input with 486000 nodes takes 2200 seconds.

Note that this problem only occurs when canonicalizing a subtree of the docuemnt. If we canonicalize the whole document, it completes almost immediately.

The problem is that canonicalization uses an XPath expression to find the nodeset that should be canonicalized. Evaluation of the XPath expression takes a lot of time as the input size grows, but the libxml2 xmlC14NDocSaveTo() function also has to do a lookup in the nodeset returned by the XPath expression for every node it encounters.

I believe a better solution would be to do this like it is done in the xmlsec library. This library use the xmlC14NExecute()-function instead, which accepts a callback that determines whether a node should be included in the result. This should make the speed of canonicalization linear with the input size.


Test script:
---------------
<?php
$doc = new DOMDocument();
$doc->load('some-large-xml-file.xml');
$start = microtime(TRUE);
$doc->documentElement->C14N(FALSE, FALSE);
echo "Done in " . (microtime(TRUE) - $start) . " seconds.\n";



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-01-05 19:26 UTC] rrichards@php.net
-Status: Open +Status: Assigned -Assigned To: +Assigned To: rrichards
 [2012-09-07 09:47 UTC] M dot Slowe at kent dot ac dot uk
This problem appears to be work-around-able by patching xmlseclibs.php (in 
SimpleSAMLphp, at least).

See https://groups.google.com/forum/#!msg/simplesamlphp/b6fTf53iq4w/uNhw_NBNzxkJ 
for details and the patch at:

http://pastebin.com/byVmBXHQ
 [2017-10-24 06:14 UTC] kalle@php.net
-Status: Assigned +Status: Open -Assigned To: rrichards +Assigned To:
 [2023-09-23 19:01 UTC] nielsdos@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: nielsdos
 [2023-09-23 19:01 UTC] nielsdos@php.net
The fix for this bug has been committed.
If you are still experiencing this bug, try to check out latest source from https://github.com/php/php-src and re-test.
Thank you for the report, and for helping us make PHP better.

Fixed in https://github.com/php/php-src/commit/5d68d619436baeaeea78adae51dbd538d62abe66
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 15:01:29 2024 UTC