php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #41374 wholetext concats values of wrong nodes
Submitted: 2007-05-12 09:34 UTC Modified: 2007-05-14 11:46 UTC
From: vesselin at awcreator dot com Assigned: rrichards (profile)
Status: Closed Package: DOM XML related
PHP Version: 5.2.2 OS: Linux
Private report: No CVE-ID: None
 [2007-05-12 09:34 UTC] vesselin at awcreator dot com
Description:
------------
HTML documents loaded via DOMDocument->loadHTML() incorrectly loads some text nodes twice. Please note that formatting and whitespace in the loaded HTML is important. For example the bug does not show if the <h1> tag in the sample code is not followed by spaces/tabs.

Reproduce code:
---------------
<?php
function dump_node ($node)
{
	for (
		$child = $node->firstChild;
		$child !== null;
		$child = $child->nextSibling
	) {
		printf ("NODE TYPE: %s\n", $child->nodeType);
		switch ($child->nodeType) {
		case XML_ELEMENT_NODE:
			printf ("TYPE: ELEMENT, TAG: \"%s\"\n", $child->tagName);
			dump_node ($child);
			break;
		case XML_TEXT_NODE:
			printf ("TYPE TEXT, TEXT: \"%s\"\n", htmlspecialchars ($child->wholeText));
			break;
		}
	}
}

$html = <<<EOF
<html>
<body>
<table>
<tr>
<td>
          <h1>Left col</h1>Some generic text
</td>
</tr>
</table>
</body>
</html>
EOF;

$document = new DOMDocument ();
$document->resolveExternals = true;
$document->loadHTML ($html);
dump_node ($document);
?>


Expected result:
----------------
A dump of all document nodes and only one text node that has "Some generic text" as data.

Actual result:
--------------
A dump of all document nodes and two text nodes that have "Some generic text" as data.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-05-12 09:37 UTC] vesselin at awcreator dot com
Actually this sentence:
"For example the bug does not show if the <h1>
tag in the sample code is not followed by spaces/tabs."
should be read as:
"For example the bug does not show if the <h1>
tag in the sample code is not PRECEDED by spaces/tabs."
 [2007-05-12 13:43 UTC] rrichards@php.net
change summary and assign to self
 [2007-05-14 11:46 UTC] rrichards@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 11:01:30 2024 UTC