php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70817 DOMDocument save removes html within a string
Submitted: 2015-10-30 00:44 UTC Modified: 2018-06-27 11:49 UTC
Votes:2
Avg. Score:5.0 ± 0.0
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:2 (100.0%)
From: untuned20 at gmail dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.6.14 OS: Linux
Private report: No CVE-ID: None
 [2015-10-30 00:44 UTC] untuned20 at gmail dot com
Description:
------------
When parsing a DOMDocument(), strings containing html are validated and probably shouldn't be.

For example when parsing a `<script>` tag containing html to be executed, the parser will validate that html. In the example script, The parser removes the `</h1>` leaving an invalid string of "<h1>hello".

Test script:
---------------
<?php

$doc = new DOMDocument();

$doc->loadXML('<html>
    <head>
    </head>
    <body>
        <script>
            console.log("<h1>hello</h1>");
        </script>
    </body>
</html>');

echo $doc->saveHTML();

Expected result:
----------------
<html>
    <head>
    </head>
    <body>
        <script>
            console.log("<h1>hello</h1>");
        </script>
    </body>
</html>

Actual result:
--------------
<html>
    <head>
    </head>r
    <body>
        <script>
            console.log("<h1>hello");
        </script>
    </body>
</html>

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-10-30 00:49 UTC] untuned20 at gmail dot com
The above test script is wrong, it should be loadHTML() instead.

Test script:
---------------
<?php

$doc = new DOMDocument();

$doc->loadHTML('<html>
    <head>
    </head>
    <body>
        <script>
            console.log("<h1>hello</h1>");
        </script>
    </body>
</html>');

echo $doc->saveHTML();
 [2018-06-27 10:16 UTC] alexeyseliverstov dot dev at gmail dot com
Just tried using this with PHP 7.1 on Ubuntu 16.04. For some reason, closing html tags inside the script tag are deleted.

This is the warning that appears when I try to execute the code example: "Warning: DOMDocument::loadHTML(): Unexpected end tag : h1 in Entity, line: 6 in Standard input code on line 5"
 [2018-06-27 10:24 UTC] googleguy@php.net
-Status: Open +Status: Not a bug
 [2018-06-27 10:24 UTC] googleguy@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

This works as expected in all current supported versions of PHP: https://3v4l.org/UEDIL
 [2018-06-27 11:49 UTC] requinix@php.net
Technically loadHTML() is more appropriate, however the parser follows HTML 4 rules which say that you can't put a </ in there.
https://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 18 22:01:28 2024 UTC