|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2007-05-12 09:37 UTC] vesselin at awcreator dot com
[2007-05-12 13:43 UTC] rrichards@php.net
[2007-05-14 11:46 UTC] rrichards@php.net
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Mon Oct 27 08:00:01 2025 UTC |
Description: ------------ HTML documents loaded via DOMDocument->loadHTML() incorrectly loads some text nodes twice. Please note that formatting and whitespace in the loaded HTML is important. For example the bug does not show if the <h1> tag in the sample code is not followed by spaces/tabs. Reproduce code: --------------- <?php function dump_node ($node) { for ( $child = $node->firstChild; $child !== null; $child = $child->nextSibling ) { printf ("NODE TYPE: %s\n", $child->nodeType); switch ($child->nodeType) { case XML_ELEMENT_NODE: printf ("TYPE: ELEMENT, TAG: \"%s\"\n", $child->tagName); dump_node ($child); break; case XML_TEXT_NODE: printf ("TYPE TEXT, TEXT: \"%s\"\n", htmlspecialchars ($child->wholeText)); break; } } } $html = <<<EOF <html> <body> <table> <tr> <td> <h1>Left col</h1>Some generic text </td> </tr> </table> </body> </html> EOF; $document = new DOMDocument (); $document->resolveExternals = true; $document->loadHTML ($html); dump_node ($document); ?> Expected result: ---------------- A dump of all document nodes and only one text node that has "Some generic text" as data. Actual result: -------------- A dump of all document nodes and two text nodes that have "Some generic text" as data.