PHP :: Bug #79211 :: DomXPath::query() has very inconsistent performance

Bug #79211	DomXPath::query() has very inconsistent performance
Submitted:	2020-02-02 04:35 UTC	Modified:	2021-09-09 09:52 UTC
From:	tom at r dot je	Assigned:	cmb (profile)
Status:	Not a bug	Package:	DOM XML related
PHP Version:	7.4.2	OS:	Linux
Private report:	No	CVE-ID:	None

View Developer Edit

Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.

Password:

Status:
Package:
Bug Type:
Summary:
From:	tom at r dot je
New email:
PHP Version:		OS:

New Comment:

[2020-02-02 04:35 UTC] tom at r dot je

Description:
------------
Here are three sample queries

Equivalent of the CSS `tr td.one`

//tr//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]

Equivalent of the CSS `td.one`
//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]

Equivalent of the CSS `tr td`
//tr//td

There's a test script below that uses these queries on a 1000x6 HTML table.

I'd expect them all to take roughly the same amount of time to run.

Test script:
---------------
<?php
$xml = '<table>';
for ($i = 0; $i < 1000; $i++) {
	$xml .= '<tr><td class="one">A</td><td class="two">B</td><td class="three">C</td><td class="four">D</td><td class="five">E</td><td class="six">F</td>
	</tr>';
}

$xml .= '</table>';


$doc = new \DomDocument;
$doc->loadXml($xml);

$xpath = new \DomXpath($doc);

$t1 = microtime(true);
// tr td.one
$xpath->query('//tr//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]');
$t2 = microtime(true);


echo '<p>tr td.one: ' . ($t2 - $t1) . '</p>';

$t1 = microtime(true);
// td.one
$xpath->query('//td[contains(concat(\' \', normalize-space(@class), \' \'), \' one \')]');
$t2 = microtime(true);


echo '<p>td.one: ' . ($t2 - $t1) . '</p>';
$t1 = microtime(true);
// tr td
$xpath->query('//tr//td');
$t2 = microtime(true);

echo '<p>tr td: ' . ($t2 - $t1) . '</p>';

Expected result:
----------------
While I'd expect some variation in speed, the first expression seems unreasonably slow in comparison to the others.


Actual result:
--------------
tr td.one: 0.26365518569946

td.one: 0.0054380893707275

tr td: 0.00074887275695801


Finding all td elements by class name is fast
Finding all td elements inside a td is fast
Combining those, finding all td elements by class inside a tr is 50 times slower than just finding all td elements by class.

Since the first expression is a combination of the other two, how can it be 50 times slower? It's clearly not the `concat` and `normalize-space`  functions that are slowing it down because the second expression is fine.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2021-09-09 09:52 UTC] cmb@php.net

-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb

[2021-09-09 09:52 UTC] cmb@php.net

I get rather different results with libxml 2.9.10 on PHP-7.4:

0.59029412269592
0.31803107261658
0.0068929195404053

This is not really unexpected, but isn't a PHP issue, anyway,
since the performance difference would actually be caused by the
libxml2 function xmlXPathEvalExpression().  Feel free to report
that issue upstream.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Tue Oct 28 06:00:01 2025 UTC