php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78221 DOMNode::normalize() doesn't remove empty text nodes
Submitted: 2019-06-27 21:15 UTC Modified: -
From: cananian at wikimedia dot org Assigned:
Status: Closed Package: DOM XML related
PHP Version: 7.3.6 OS:
Private report: No CVE-ID: None
 [2019-06-27 21:15 UTC] cananian at wikimedia dot org
Description:
------------
Empty text nodes are supposed to be removed by DOMNode::normalize():
* PHP documentation: "Remove empty text nodes" https://www.php.net/manual/en/domnode.normalize.php
* DOM level 2 spec: "There are neither adjacent Text nodes nor empty Text nodes." https://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-normalize
* Latest WHATWG DOM spec: "If length is zero, then remove node and continue with the next exclusive Text node, if any." https://dom.spec.whatwg.org/#dom-node-normalize

The PHP implementation appears to combine adjacent Text nodes, but does not remove zero-length text nodes.

Test script:
---------------
<?php
$doc = \DOMDocument::loadHTML('<p id=x>foo</p>');
$p = $doc->getElementById('x');
$p->childNodes[0]->textContent = '';
$p->normalize();
# This should print 0.  But it prints 1.
var_dump($p->childNodes->length);


Expected result:
----------------
int(0)

Actual result:
--------------
int(1)

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-09-22 22:51 UTC] beberlei@php.net
Confirmed, code for this case is missing in php_dom.c dom_normalize
 [2020-03-11 12:09 UTC] cmb@php.net
The following pull request has been associated:

Patch Name: Fix #78221: DOMNode::normalize() doesn't remove empty text nodes
On GitHub:  https://github.com/php/php-src/pull/5254
Patch:      https://github.com/php/php-src/pull/5254.patch
 [2020-04-07 11:10 UTC] cmb@php.net
Automatic comment on behalf of cmbecker69@gmx.de
Revision: http://git.php.net/?p=php-src.git;a=commit;h=efec22b7bedfb1eae2df72b84cf5ad229e0bdc1e
Log: Fix #78221: DOMNode::normalize() doesn't remove empty text nodes
 [2020-04-07 11:10 UTC] cmb@php.net
-Status: Open +Status: Closed
 [2020-06-05 15:03 UTC] divinity76 at gmail dot com
i'm not entirely sure this should have been fixed in a minor release, this might break some HTML DOM traversing code in the wild like ->nextSibling->nextSibling->blah, and thus is a breaking change, isn't it?
 [2020-06-05 15:08 UTC] divinity76 at gmail dot com
s/minor release/patch release
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 09:01:32 2024 UTC