php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #54429 domDocument::loadHTML adds extra CDATA tags causing script errors.
Submitted: 2011-03-31 06:23 UTC Modified: 2018-07-08 14:26 UTC
Votes:3
Avg. Score:4.0 ± 0.8
Reproduced:3 of 3 (100.0%)
Same Version:2 (66.7%)
Same OS:0 (0.0%)
From: ken at smallboxcms dot com Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: 5.3.6 OS: Centos 5.5
Private report: No CVE-ID: None
 [2011-03-31 06:23 UTC] ken at smallboxcms dot com
Description:
------------
This one has annoyed me for a long time. When you load HTML useing loadHTML you get extra CDATA tags on the script tags causing browser errors. This does not happen when loadXML is used. 





Test script:
---------------
<?php

$foo = '
<html>
<head>
<title></title>
</head>
<body>
<script type="text/javascript">
//<![CDATA[

alert("foo bar");

//]]>
</script>

</body>
</html>
';

$dom = new domDocument('1,0', 'utf-8');
$dom->loadHTML($foo);
echo $dom->saveXML();

?>

Expected result:
----------------
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><title/></head><body>
<script type="text/javascript">
//<![CDATA[

alert("foo bar");

//]]>
</script></body></html>


Actual result:
--------------
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><title/></head><body>
<script type="text/javascript"><![CDATA[
//<![CDATA[

alert("foo bar");

//]]]]><![CDATA[>
]]></script></body></html>


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-03-31 06:27 UTC] rasmus@php.net
-Status: Open +Status: Suspended
 [2011-03-31 06:27 UTC] rasmus@php.net
You would need to file this against the libxml2 library as this isn't something 
we can fix at the PHP level. See http://xmlsoft.org/bugs.html
 [2011-03-31 06:42 UTC] ken at smallboxcms dot com
This may be this bug which appears to have sat unconfirmed for 5 years: 
https://bugzilla.gnome.org/show_bug.cgi?id=357992
 [2011-03-31 07:02 UTC] ken at smallboxcms dot com
Here is a userland workaround: 

$list = $dom->getElementsByTagName('script');
foreach ($list as $script) {
    if ($script->childNodes->length && $script->firstChild->nodeType == 4) {
        $cdata = $script->removeChild($script->firstChild);
        $text = $dom->createTextNode($cdata->nodeValue);
        $script->appendChild($text);
    }
}
 [2013-01-01 08:13 UTC] i_me_you2002 at yahoo dot com
This is not a bug -- the problem is fn:saveXML() is being used where fn:saveHTML 
must be used. Also, fn:$DOMDocument->save is the XML version which will too 
result in the same problem.
 [2018-07-08 14:26 UTC] cmb@php.net
-Status: Suspended +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2018-07-08 14:26 UTC] cmb@php.net
> This is not a bug -- the problem is fn:saveXML() is being used
> where fn:saveHTML must be used.

That.  See <https://3v4l.org/TfpK3>.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 12:01:27 2024 UTC