php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78129 DOMDocument removes whitespace between nodes
Submitted: 2019-06-08 14:31 UTC Modified: -
From: nospam at unclassified dot de Assigned:
Status: Open Package: *General Issues
PHP Version: 7.3.6 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: nospam at unclassified dot de
New email:
PHP Version: OS:

 

 [2019-06-08 14:31 UTC] nospam at unclassified dot de
Description:
------------
The DOMDocument removes whitespace between HTML elements when loading and saving HTML content. This leads to wrong display of content. The bug only affects Linux, it works correctly in Windows (same version).

Test script:
---------------
<?php
$html = '<html>
<body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang>
    <strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script>
    <script src="site.js"></script>
</body>
</html>';

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);

header('Content-Type: text/plain');
echo $dom->saveHTML();


Expected result:
----------------
This is the output on Windows:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang>
    <strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script>
    <script src="site.js"></script>
</body>
</html>


Actual result:
--------------
This is the output on Linux:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang><strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script><script src="site.js"></script></body></html>

There is no space between "Logged in as" and "user". The <lang> element is a custom element that my PHP script will resolve into plain HTML.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-06-08 14:41 UTC] nospam at unclassified dot de
It seems Linux always adds the LIBXML_NOBLANKS option whereas Windows does not. I can get the same result on Windows as on Linux when I use this code:

$dom->loadHTML($html, LIBXML_NOBLANKS);

But there is not option that inverts that option. To me it looks like the Linux build of PHP has the bug that the LIBXML_NOBLANKS option is always added while it should not. I couldn't find information about that option because the libxml documentation has no description of this option. It's just empty.
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Sat Sep 21 19:01:26 2019 UTC