php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79211 DomXPath::query() has very inconsistent performance
Submitted: 2020-02-02 04:35 UTC Modified: 2021-09-09 09:52 UTC
From: tom at r dot je Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: 7.4.2 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: tom at r dot je
New email:
PHP Version: OS:

 

 [2020-02-02 04:35 UTC] tom at r dot je
Description:
------------
Here are three sample queries

Equivalent of the CSS `tr td.one`

//tr//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]

Equivalent of the CSS `td.one`
//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]

Equivalent of the CSS `tr td`
//tr//td

There's a test script below that uses these queries on a 1000x6 HTML table.

I'd expect them all to take roughly the same amount of time to run.

Test script:
---------------
<?php
$xml = '<table>';
for ($i = 0; $i < 1000; $i++) {
	$xml .= '<tr><td class="one">A</td><td class="two">B</td><td class="three">C</td><td class="four">D</td><td class="five">E</td><td class="six">F</td>
	</tr>';
}

$xml .= '</table>';


$doc = new \DomDocument;
$doc->loadXml($xml);

$xpath = new \DomXpath($doc);

$t1 = microtime(true);
// tr td.one
$xpath->query('//tr//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]');
$t2 = microtime(true);


echo '<p>tr td.one: ' . ($t2 - $t1) . '</p>';

$t1 = microtime(true);
// td.one
$xpath->query('//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]');
$t2 = microtime(true);


echo '<p>td.one: ' . ($t2 - $t1) . '</p>';
$t1 = microtime(true);
// tr td
$xpath->query('//tr//td');
$t2 = microtime(true);

echo '<p>tr td: ' . ($t2 - $t1) . '</p>';

Expected result:
----------------
While I'd expect some variation in speed, the first expression seems unreasonably slow in comparison to the others.


Actual result:
--------------
tr td.one: 0.26365518569946

td.one: 0.0054380893707275

tr td: 0.00074887275695801


Finding all td elements by class name is fast
Finding all td elements inside a td is fast
Combining those, finding all td elements by class inside a tr is 50 times slower than just finding all td elements by class.

Since the first expression is a combination of the other two, how can it be 50 times slower? It's clearly not the `concat` and `normalize-space`  functions that are slowing it down because the second expression is fine. 




Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-09-09 09:52 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-09-09 09:52 UTC] cmb@php.net
I get rather different results with libxml 2.9.10 on PHP-7.4:

0.59029412269592
0.31803107261658
0.0068929195404053

This is not really unexpected, but isn't a PHP issue, anyway,
since the performance difference would actually be caused by the
libxml2 function xmlXPathEvalExpression().  Feel free to report
that issue upstream.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Feb 05 19:01:31 2025 UTC