|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #71452 Error parsing js strings which include tags.
Submitted: 2016-01-26 10:38 UTC Modified: 2017-07-05 07:31 UTC
Avg. Score:4.0 ± 1.2
Reproduced:6 of 6 (100.0%)
Same Version:2 (33.3%)
Same OS:3 (50.0%)
From: php at carl-ambroselli dot de Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.5.31 OS: Tested MacOS & Debian
Private report: No CVE-ID: None
 [2016-01-26 10:38 UTC] php at carl-ambroselli dot de
DomDocument fails to parse a document if it includes JavaScript Strings with HTML Tags inside the strings.

Test script:
$doc = new DomDocument();
$doc->loadHTML('<html><body><script>var p = \'<p>123</p>\'</script></body></html>');
echo $doc->saveHTML();

Expected result:
<html><body><script>var p = \'<p>123</p>\'</script></body></html>

Actual result:
<html><body><script>var p = '<p>123'</script></body></html>

(The closing </p> tag got removed inside a JavaScript string)


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2017-07-05 06:35 UTC] qdinar at gmail dot com
i confirm this for php 5.6.30 in windows 10.
 [2017-07-05 06:48 UTC] qdinar at gmail dot com
i have reported another bug because i have found that it differs from this by additional CDATA tag.
 [2017-07-05 07:31 UTC]
-Status: Open +Status: Not a bug
 [2017-07-05 07:31 UTC]
This is an issue with libxml, not PHP.

HTML 4 prohibits the use of "</" in non-HTML data, meaning the "</p>" will be interpreted as an end tag. However the "<p>" before will be totally ignored. libxml recovers from the <script></p> mismatch by dropping the unexpected end tag - with a warning.

HTML 5 has slightly different rules which allow for "</p>", but libxml does not currently support HTML 5.

Best practice is still to escape fake end tags in <script>s. That applies in PHP too.
  var p = '<p>123<\/p>'
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon Sep 28 13:01:23 2020 UTC