php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74863 php DOMDocument loadHTML parses html tags inside comments
Submitted: 2017-07-06 09:00 UTC Modified: 2017-07-06 09:04 UTC
From: qdinar at gmail dot com Assigned:
Status: Duplicate Package: *General Issues
PHP Version: 5.6.30 OS: windows 10
Private report: No CVE-ID: None
 [2017-07-06 09:00 UTC] qdinar at gmail dot com
Description:
------------
html comment boundary marks which was used to escape html tags inside strings inside script tags failed to perform that task while it was feed to PHP's DOMDocument's loadHTML , which is made with libxml.

you can see in the example that "'; c='456'; // -->;" - content of script element is going to be outputted to user.

this is found by trying simple html comments after reading comments on bug https://bugs.php.net/bug.php?id=74858 .


Test script:
---------------
$d=new DOMDocument();
$test_content='
<script>
//<!-- 
a=\'123\';
b=\'</script>\';
c=\'456\';
// -->
</script>
';
$d->loadHTML($test_content);
echo $d->saveHTML();


Expected result:
----------------
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script>
//<!-- 
a='123';
b='</script>';
c='456';
// -->
</script></head><body>
</body></html>


Actual result:
--------------
PHP Warning:  DOMDocument::loadHTML(): Unexpected end tag : script in Entity, line: 8 in C:\xampp\htdocs\test\dom_load_html.php on line 38

Warning: DOMDocument::loadHTML(): Unexpected end tag : script in Entity, line: 8 in C:\xampp\htdocs\test\dom_load_html.php on line 38
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script>
//<!--
a='123';
b='</script></head><body><p>';
c='456';
// --&gt;
</p></body></html>


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-07-06 09:04 UTC] requinix@php.net
-Status: Open +Status: Duplicate
 [2017-07-06 09:04 UTC] requinix@php.net
Read what I said in #71452.
 [2017-07-06 09:11 UTC] gooh@php.net
From https://www.w3.org/TR/html4/types.html#type-cdata

> Although the STYLE and SCRIPT elements use CDATA for their data model, for these elements, CDATA must be handled differently by user agents. Markup and entities must be treated as raw text and passed to the application as is. The first occurrence of the character sequence "</" (end-tag open delimiter) is treated as terminating the end of the element's content. In valid documents, this would be the end tag for the element.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 00:01:30 2024 UTC