php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80332 Completely broken array access functionality with DOMNamedNodeMap
Submitted: 2020-11-07 07:00 UTC Modified: 2020-11-07 12:12 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: asmqb7 at gmail dot com Assigned:
Status: Closed Package: DOM XML related
PHP Version: 7.4.12 OS: Debian/likely all OSes
Private report: No CVE-ID: None
 [2020-11-07 07:00 UTC] asmqb7 at gmail dot com
Description:
------------
(Apologies for the multiple inline explanatory pseudo-code samples, but I was unsure how to best outline the usability impact of this bug without them. See also the complete test script.)

At first glance, it appears to be possible/acceptable to manipulate DOM element attributes via direct array access. For example, the following code...

  $doc = new DOMDocument;
  $doc->loadHTML('<a href="https://example.com">hi!</a>');
  $a = (new DOMXPath($doc))->query('//a')[0];
  $a->attributes['href']->nodeValue = 'https://example.net';
  print $doc->saveHTML();
	
...appears to work, producing:

  <a href="https://example.net">hi!</a>

An inexperienced or non-oriented developer (and I was both of those yesterday afternoon, with regards to DOMDocument) might stumble on the above approach through blind experimentation and, *after observing that a seemingly complex manipulation produces correct end-to-end results without throwing any errors*, conclude that DOMNamedNodeMap permits and supports direct array access, similar to how direct DOMNode->nodeValue manipulation works elsewhere.

The fact that this approach is straightforward and intuitive - and appears to work - may only serve to lengthen the time the developer goes around in circles suspecting everything but the array access code, when things go sideways in extremely confusing ways.

For example, introducing a second attribute...

  ...
  $doc->loadHTML('<a href="https://example.com" class="myclass">hi!</a>');
  ...
  print $a->attributes['class']->nodeName."\n";
  $a->attributes['class']->nodeValue = 'class2';
  print $doc->saveHTML();

...will print:

  href
  <a href="class2">hi!</a>

This is because (insert "SEVERAL HOURS LATER" graphic here) /whichever key you attempt to read will return the first DOMAttr/, including nonexistent keys:

  ...
  $doc->loadHTML('<a href="https://example.com">hi!</a>');
  ...
  print $a->attributes['IDoNotExist']->nodeName."\n";
  ...

This produces:

  href

I am extremely appreciative DOMDocument is not responsible for holding together the fabric of reality.

On a more serious note, it appears that DOMNamedNodeMap implements Iterable in the most literal sense - as an iterator:

  ...
  $doc->loadHTML('<a href="https://example.com" class="myclass" id="myid">hi!</a>');
  ...
  foreach ($a->attributes as $attr) {
    print $attr->nodeName.': '.$attr->nodeValue."\n";
  }

This produces:
  
  href: https://example.com
  class: myclass
  id: myid

And modifying the foreach to try to write back by reference (&)...

  ...
  ...
  foreach ($a->attributes as &$attr) {
    ...
  ...

...doesn't work:

  Fatal error: Uncaught Error: An iterator cannot be used with foreach by reference

So everything sorta makes sense now; DOMNamedNodeMaps are iterators.

But this does not completely square off the fact that it's possible to read the first attribute in a DOMNamedNodeMap by passing in any string. For conciseness, this area is the sole focus of the test script.

The DOMNamedNodeMap-as-iterators implementation detail is not documented anywhere. I feel that knowing this may have been useful, but only because the implementation is broken.

(NB. Submission-related trivia: Thanks to the fact that the web-bugs repo is online, I was able to change the http:// URLs above to https:// so is_spam() didn't find 5 'http://' substrings and return true :D)

Test script:
---------------
<?php

  $doc = new DOMDocument;
  
  $doc->loadHTML('<span attr1="value1" attr2="value2"></span>');
  
  $x = new DOMXPath($doc);
  $span = $x->query('//span')[0];
  
  print "Node name: {$span->nodeName}\n";

  print "Attribute [0] name: {$span->attributes[0]->nodeName}\n";
  print "Attribute [0] value: {$span->attributes[0]->nodeValue}\n";

  print "Attribute 'hi' name: {$span->attributes['hi']->nodeName}\n";
  print "Attribute 'hi' value: {$span->attributes['hi']->nodeValue}\n";

  print "Attribute 'attr2' name: {$span->attributes['attr2']->nodeName}\n";
  print "Attribute 'attr2' value: {$span->attributes['attr2']->nodeValue}\n";


Expected result:
----------------
The two logical and safe results I would expect would be either

1) Any array access attempt to print a warning or throw an error/exception

2) For array accesses of
  2a) extant attributes to return their associated DOMAttr node
  2a) non-existing attributes to print a warning or throw an error/exception

If it is decided that (2) is not appropriate - for whatever reason - I would emphasize my support for the implementation of (1) as soon as possible, to ensure that any code erroneously using array accesses smashes into an appropriate brick wall so it can be fixed.

Actual result:
--------------
Node name: span
Attribute [0] name: attr1
Attribute [0] value: value1
Attribute 'hi' name: attr1
Attribute 'hi' value: value1
Attribute 'attr2' name: attr1
Attribute 'attr2' value: value1


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-11-07 12:12 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2020-11-07 12:12 UTC] cmb@php.net
It seems to me the culprit is that the read_dimension handler
shares its implementation with DOMNodeList[1], and so assumes
its offsets to be numbers[2].

[1] <https://github.com/php/php-src/blob/php-7.4.12/ext/dom/php_dom.c#L610>
[2] <https://github.com/php/php-src/blob/php-7.4.12/ext/dom/php_dom.c#L1556>
 [2023-06-18 13:21 UTC] git@php.net
Automatic comment on behalf of nielsdos
Revision: https://github.com/php/php-src/commit/9f7d88802e693c61e569988ef8a0b015d32c7668
Log: Fix #80332: Completely broken array access functionality with DOMNamedNodeMap
 [2023-06-18 13:21 UTC] git@php.net
-Status: Verified +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 23 13:01:29 2024 UTC