|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2015-11-18 10:54 UTC] sashott at abv dot bg
Description:
------------
DOMDocument return wrong tree by tag in variable in script tag. As example in DIV, a closing tag DIV written in script tag (in text variable), close the previous DIV.
Test script:
---------------
<?php
$html_content='<!DOCTYPE html>
<html>
<body>
<div id="something">
<script>
var somevar=\'<div></div>\';
</script>
<div></div>
<div></div>
</div>
</body>
</html>
';
$oldSetting=libxml_use_internal_errors(true);
libxml_clear_errors();
$html=new DOMDocument();
$html->loadHtml($html_content);
$node=$html->getElementById('something');
echo "<pre>";
foreach($node->childNodes as $child_node){
echo $child_node->nodeName."\n";
}
echo "</pre>";
libxml_clear_errors();
libxml_use_internal_errors($oldSetting);
//Result:
//#text
//script
//Error appear from: var somevar=\'<div></div>\';
?>
Expected result:
----------------
script
div
div
//Can have some #text between them.
Actual result:
--------------
#text
script
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Oct 26 08:00:02 2025 UTC |
Bug confirmed and I wish to add another caveat. When script tag contains "text/html" the probability to have a broken tree grows exponentially. I also tried adding some <![CDATA[ ... ]]> but this isn't solving the problem. Example: ----------- html.html ----------- <!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <title>Titolo</title> <meta name="description" content="aaaa"> <meta name="viewport" content="width=device-width, initial-scale=1"> </head> <body> <div class="wrapper"> <header></header> <main> <script type="text/html" id="ciao2"><![CDATA[ <div>DDD</div> ]]></script> </main> <footer></footer> </div> <script type="text/html" id="ciao"> <div>AAAA</div> </script> <script> console.log(document.getElementById('ciao'), document.getElementById('ciao2'), '<div>ZZZZ</div>'); </script> <div>BBBB</div> </body> </html> ----------- test.php ----------- <?php $html = file_get_contents('html.html'); $dom = DOMDocument::loadHtml($html); echo $dom->saveHTML(); --------------------------------