php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74241 DomDocument::loadHTML ignores LIBXML_NOCDATA
Submitted: 2017-03-13 10:42 UTC Modified: 2017-03-13 10:57 UTC
From: tom at r dot je Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 7.1.2 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: tom at r dot je
New email:
PHP Version: OS:

 

 [2017-03-13 10:42 UTC] tom at r dot je
Description:
------------
Using PHP 7.1 with Libxml version 2.9.4 DomDocument::loadHTML ignores LIBXML_NOCDATA.


Test script:
---------------
$doc = new \DomDocument();

$doc->loadHTML('<!doctype html>
<html><head>

<script>
$().ready(function() {});
</script>
</head>
<body>
</body>
</html>
', LIBXML_NOCDATA);

echo $doc->saveXML(null, LIBXML_NOCDATA);


?>

Output:

---------

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html>
<html><head>

<script><![CDATA[
$().ready(function() {});
]]></script>
</head>
<body>
</body>
</html>

---------


Changing `loadHTML` to `loadXML` works as expected, no <![CDATA[ appears in the output.

saveHTML also omits the CDATA tags but as I'm working with document fragments, this isn't helpful as it wraps everything with html and body tags. 


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-03-13 10:57 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2017-03-13 10:57 UTC] requinix@php.net
LIBXML_NOCDATA, which matters at load time and not save time, is not respected because CDATA tags make no sense to have in HTML.

Loading as HTML but saving as XML, or vice versa, is incorrect: either the content is HTML or it is XML. If you want to deal with HTML fragments without the DTD and implied <html> elements then use the LIBXML_HTML_NODEFDTD and LIBXML_HTML_NOIMPLIED options with loadHTML().
https://3v4l.org/8MnED
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 14:01:31 2024 UTC