php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79701 getElementById does not correctly work with duplicate definitions
Submitted: 2020-06-15 02:17 UTC Modified: 2020-06-15 08:23 UTC
From: beberlei@php.net Assigned:
Status: Closed Package: DOM XML related
PHP Version: 7.2.31 OS:
Private report: No CVE-ID: None
 [2020-06-15 02:17 UTC] beberlei@php.net
Description:
------------
A DOM id element is supposed to be unique, but sometimes the world is messy. The DOM specification says:

> The getElementById(elementId) method, when invoked, must return the first element,
> in tree order, within this’s descendants, whose ID is elementId, and null if there
> is no such element otherwise. 

PHP uses libxmls ID Map functionality, which does not allow duplicates. The return value of "xmlAddID" is not checked for the error, so elements with duplicate ID don't cause a problem.

However if you remove the first element with an ID, or re-order the elements, then the specifications assumption of returning the first element in tree order does not work anymore.

Test script:
---------------
<?php

$dom = new DOMDocument();
$root = $dom->createElement('html');
$dom->appendChild($root);

$el1 = $dom->createElement('p1');
$el1->setAttribute('id', 'foo');
$el1->setIdAttribute('id', true);

$root->appendChild($el1);

$el2 = $dom->createElement('p2');
$el2->setAttribute('id', 'foo');
$el2->setIdAttribute('id', true);

$root->appendChild($el2);
unset($el1, $el2);

$root->removeChild($dom->getElementById('foo'));

var_dump($dom->getElementById('foo'));


Expected result:
----------------
Returns Element <p2 id="foo" />

Actual result:
--------------
NULL

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-06-15 08:23 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2020-06-15 08:23 UTC] cmb@php.net
Confirmed: <https://3v4l.org/Ols6m>
 [2024-05-21 08:46 UTC] dennis6101990leon at outlook dot com
It seems you’ve encountered a known issue when dealing with non-unique IDs within a DOM in PHP. The getElementById method is indeed designed to return the first element with the specified ID. However, when there are duplicates, and the first element is removed, it can lead to unexpected behavior because the ID map does not update as one might expect.

Here’s a workaround for this issue: Instead of relying on getElementById, you can use other methods such as getElementsByTagName or querySelectorAll to retrieve all elements with the same ID and then handle them accordingly. Here’s an example using querySelectorAll: (https://github.com)(https://www.health-insurancemarket.com)

<?php

$dom = new DOMDocument();
$root = $dom->createElement('html');
$dom->appendChild($root);

$el1 = $dom->createElement('p');
$el1->setAttribute('id', 'foo');
$root->appendChild($el1);

$el2 = $dom->createElement('p');
$el2->setAttribute('id', 'foo');
$root->appendChild($el2);

// Use querySelectorAll to get all elements with the same ID
$elements = new DOMXPath($dom);
$results = $elements->query("//*[@id='foo']");

// Remove the first element with the ID 'foo'
$root->removeChild($results->item(0));

// Now, let's check if the second element with ID 'foo' is still accessible
var_dump($results->item(1));

?>

This script will return the second <p> element with the ID ‘foo’, as expected. Remember that having multiple elements with the same ID is not valid HTML, so it’s best to avoid such situations when possible.
 [2024-06-01 10:54 UTC] git@php.net
Automatic comment on behalf of nielsdos
Revision: https://github.com/php/php-src/commit/8dc2391bae9652a0d59ac02c60bb52aeaa184d74
Log: Fix bug #79701: getElementById does not correctly work with duplicate definitions
 [2024-06-01 10:54 UTC] git@php.net
-Status: Verified +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 04 03:01:30 2024 UTC