php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79211 DomXPath::query() has very inconsistent performance
Submitted: 2020-02-02 04:35 UTC Modified: 2021-09-09 09:52 UTC
From: tom at r dot je Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: 7.4.2 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
46 - 6 = ?
Subscribe to this entry?

 
 [2020-02-02 04:35 UTC] tom at r dot je
Description:
------------
Here are three sample queries

Equivalent of the CSS `tr td.one`

//tr//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]

Equivalent of the CSS `td.one`
//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]

Equivalent of the CSS `tr td`
//tr//td

There's a test script below that uses these queries on a 1000x6 HTML table.

I'd expect them all to take roughly the same amount of time to run.

Test script:
---------------
<?php
$xml = '<table>';
for ($i = 0; $i < 1000; $i++) {
	$xml .= '<tr><td class="one">A</td><td class="two">B</td><td class="three">C</td><td class="four">D</td><td class="five">E</td><td class="six">F</td>
	</tr>';
}

$xml .= '</table>';


$doc = new \DomDocument;
$doc->loadXml($xml);

$xpath = new \DomXpath($doc);

$t1 = microtime(true);
// tr td.one
$xpath->query('//tr//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]');
$t2 = microtime(true);


echo '<p>tr td.one: ' . ($t2 - $t1) . '</p>';

$t1 = microtime(true);
// td.one
$xpath->query('//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]');
$t2 = microtime(true);


echo '<p>td.one: ' . ($t2 - $t1) . '</p>';
$t1 = microtime(true);
// tr td
$xpath->query('//tr//td');
$t2 = microtime(true);

echo '<p>tr td: ' . ($t2 - $t1) . '</p>';

Expected result:
----------------
While I'd expect some variation in speed, the first expression seems unreasonably slow in comparison to the others.


Actual result:
--------------
tr td.one: 0.26365518569946

td.one: 0.0054380893707275

tr td: 0.00074887275695801


Finding all td elements by class name is fast
Finding all td elements inside a td is fast
Combining those, finding all td elements by class inside a tr is 50 times slower than just finding all td elements by class.

Since the first expression is a combination of the other two, how can it be 50 times slower? It's clearly not the `concat` and `normalize-space`  functions that are slowing it down because the second expression is fine. 




Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-09-09 09:52 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-09-09 09:52 UTC] cmb@php.net
I get rather different results with libxml 2.9.10 on PHP-7.4:

0.59029412269592
0.31803107261658
0.0068929195404053

This is not really unexpected, but isn't a PHP issue, anyway,
since the performance difference would actually be caused by the
libxml2 function xmlXPathEvalExpression().  Feel free to report
that issue upstream.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 16 16:01:28 2024 UTC