php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #41374 wholetext concats values of wrong nodes
Submitted: 2007-05-12 09:34 UTC Modified: 2007-05-14 11:46 UTC
From: vesselin at awcreator dot com Assigned: rrichards (profile)
Status: Closed Package: DOM XML related
PHP Version: 5.2.2 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: vesselin at awcreator dot com
New email:
PHP Version: OS:

 

 [2007-05-12 09:34 UTC] vesselin at awcreator dot com
Description:
------------
HTML documents loaded via DOMDocument->loadHTML() incorrectly loads some text nodes twice. Please note that formatting and whitespace in the loaded HTML is important. For example the bug does not show if the <h1> tag in the sample code is not followed by spaces/tabs.

Reproduce code:
---------------
<?php
function dump_node ($node)
{
	for (
		$child = $node->firstChild;
		$child !== null;
		$child = $child->nextSibling
	) {
		printf ("NODE TYPE: %s\n", $child->nodeType);
		switch ($child->nodeType) {
		case XML_ELEMENT_NODE:
			printf ("TYPE: ELEMENT, TAG: \"%s\"\n", $child->tagName);
			dump_node ($child);
			break;
		case XML_TEXT_NODE:
			printf ("TYPE TEXT, TEXT: \"%s\"\n", htmlspecialchars ($child->wholeText));
			break;
		}
	}
}

$html = <<<EOF
<html>
<body>
<table>
<tr>
<td>
          <h1>Left col</h1>Some generic text
</td>
</tr>
</table>
</body>
</html>
EOF;

$document = new DOMDocument ();
$document->resolveExternals = true;
$document->loadHTML ($html);
dump_node ($document);
?>


Expected result:
----------------
A dump of all document nodes and only one text node that has "Some generic text" as data.

Actual result:
--------------
A dump of all document nodes and two text nodes that have "Some generic text" as data.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-05-12 09:37 UTC] vesselin at awcreator dot com
Actually this sentence:
"For example the bug does not show if the <h1>
tag in the sample code is not followed by spaces/tabs."
should be read as:
"For example the bug does not show if the <h1>
tag in the sample code is not PRECEDED by spaces/tabs."
 [2007-05-12 13:43 UTC] rrichards@php.net
change summary and assign to self
 [2007-05-14 11:46 UTC] rrichards@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Oct 27 16:01:27 2024 UTC