php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78129 DOMDocument removes whitespace between nodes
Submitted: 2019-06-08 14:31 UTC Modified: 2020-06-16 16:10 UTC
From: nospam at unclassified dot de Assigned:
Status: Open Package: DOM XML related
PHP Version: 7.3.6 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please — but make sure to vote on the bug!
Your email address:
MUST BE VALID
Solve the problem:
42 + 43 = ?
Subscribe to this entry?

 
 [2019-06-08 14:31 UTC] nospam at unclassified dot de
Description:
------------
The DOMDocument removes whitespace between HTML elements when loading and saving HTML content. This leads to wrong display of content. The bug only affects Linux, it works correctly in Windows (same version).

Test script:
---------------
<?php
$html = '<html>
<body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang>
    <strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script>
    <script src="site.js"></script>
</body>
</html>';

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);

header('Content-Type: text/plain');
echo $dom->saveHTML();


Expected result:
----------------
This is the output on Windows:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang>
    <strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script>
    <script src="site.js"></script>
</body>
</html>


Actual result:
--------------
This is the output on Linux:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang><strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script><script src="site.js"></script></body></html>

There is no space between "Logged in as" and "user". The <lang> element is a custom element that my PHP script will resolve into plain HTML.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-06-08 14:41 UTC] nospam at unclassified dot de
It seems Linux always adds the LIBXML_NOBLANKS option whereas Windows does not. I can get the same result on Windows as on Linux when I use this code:

$dom->loadHTML($html, LIBXML_NOBLANKS);

But there is not option that inverts that option. To me it looks like the Linux build of PHP has the bug that the LIBXML_NOBLANKS option is always added while it should not. I couldn't find information about that option because the libxml documentation has no description of this option. It's just empty.
 [2019-11-02 19:20 UTC] nospam at unclassified dot de
Can somebody please have a quick look at this? To me this looks like a simple fix, just edit that one line that sets the option for the Linux build.
 [2020-06-07 15:33 UTC] nospam at unclassified dot de
Is this issue tracker still in use and is anybody here? It feels a little lonely in here.
 [2020-06-08 09:09 UTC] cmb@php.net
-Status: Open +Status: Feedback -Package: *General Issues +Package: DOM XML related -Assigned To: +Assigned To: cmb
 [2020-06-08 09:09 UTC] cmb@php.net
I don't think this related to the OS, but rather to the libxml2
version.  Which one do you use?  Could you try with latest?
 [2020-06-14 17:42 UTC] nospam at unclassified dot de
-Status: Feedback +Status: Assigned
 [2020-06-14 17:42 UTC] nospam at unclassified dot de
I'm not sure which version I have. The server is running Ubuntu 16.04. There are two packages with that name, they have the versions "2.9.3+dfsg1-1ubuntu0.7" and "2.9.3+dfsg1-1". I can't try with another version at the moment. Will try to build up another environment for that with Ubuntu 20.04, I have nothing newer.
 [2020-06-14 20:58 UTC] cmb@php.net
You can check which version is used by PHP with phpinfo().  Ubuntu
20.04 should have libxml 2.9.10 which would be the latest version
where this issue likely has been fixed.
 [2020-06-16 16:10 UTC] cmb@php.net
-Status: Assigned +Status: Open -Assigned To: cmb +Assigned To:
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon Aug 03 21:01:24 2020 UTC