php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #54429 domDocument::loadHTML adds extra CDATA tags causing script errors.
Submitted: 2011-03-31 06:23 UTC Modified: 2018-07-08 14:26 UTC
Votes:3
Avg. Score:4.0 ± 0.8
Reproduced:3 of 3 (100.0%)
Same Version:2 (66.7%)
Same OS:0 (0.0%)
From: ken at smallboxcms dot com Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: 5.3.6 OS: Centos 5.5
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: ken at smallboxcms dot com
New email:
PHP Version: OS:

 

 [2011-03-31 06:23 UTC] ken at smallboxcms dot com
Description:
------------
This one has annoyed me for a long time. When you load HTML useing loadHTML you get extra CDATA tags on the script tags causing browser errors. This does not happen when loadXML is used. 





Test script:
---------------
<?php

$foo = '
<html>
<head>
<title></title>
</head>
<body>
<script type="text/javascript">
//<![CDATA[

alert("foo bar");

//]]>
</script>

</body>
</html>
';

$dom = new domDocument('1,0', 'utf-8');
$dom->loadHTML($foo);
echo $dom->saveXML();

?>

Expected result:
----------------
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><title/></head><body>
<script type="text/javascript">
//<![CDATA[

alert("foo bar");

//]]>
</script></body></html>


Actual result:
--------------
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><title/></head><body>
<script type="text/javascript"><![CDATA[
//<![CDATA[

alert("foo bar");

//]]]]><![CDATA[>
]]></script></body></html>


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2011-03-31 06:27 UTC] rasmus@php.net
-Status: Open +Status: Suspended
 [2011-03-31 06:27 UTC] rasmus@php.net
You would need to file this against the libxml2 library as this isn't something 
we can fix at the PHP level. See http://xmlsoft.org/bugs.html
 [2011-03-31 06:42 UTC] ken at smallboxcms dot com
This may be this bug which appears to have sat unconfirmed for 5 years: 
https://bugzilla.gnome.org/show_bug.cgi?id=357992
 [2011-03-31 07:02 UTC] ken at smallboxcms dot com
Here is a userland workaround: 

$list = $dom->getElementsByTagName('script');
foreach ($list as $script) {
    if ($script->childNodes->length && $script->firstChild->nodeType == 4) {
        $cdata = $script->removeChild($script->firstChild);
        $text = $dom->createTextNode($cdata->nodeValue);
        $script->appendChild($text);
    }
}
 [2013-01-01 08:13 UTC] i_me_you2002 at yahoo dot com
This is not a bug -- the problem is fn:saveXML() is being used where fn:saveHTML 
must be used. Also, fn:$DOMDocument->save is the XML version which will too 
result in the same problem.
 [2018-07-08 14:26 UTC] cmb@php.net
-Status: Suspended +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2018-07-08 14:26 UTC] cmb@php.net
> This is not a bug -- the problem is fn:saveXML() is being used
> where fn:saveHTML must be used.

That.  See <https://3v4l.org/TfpK3>.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Feb 05 20:01:30 2025 UTC