php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78129 DOMDocument removes whitespace between nodes
Submitted: 2019-06-08 14:31 UTC Modified: 2021-01-31 04:22 UTC
From: nospam at unclassified dot de Assigned: cmb (profile)
Status: No Feedback Package: DOM XML related
PHP Version: 7.3.6 OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: nospam at unclassified dot de
New email:
PHP Version: OS:

 

 [2019-06-08 14:31 UTC] nospam at unclassified dot de
Description:
------------
The DOMDocument removes whitespace between HTML elements when loading and saving HTML content. This leads to wrong display of content. The bug only affects Linux, it works correctly in Windows (same version).

Test script:
---------------
<?php
$html = '<html>
<body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang>
    <strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script>
    <script src="site.js"></script>
</body>
</html>';

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);

header('Content-Type: text/plain');
echo $dom->saveHTML();


Expected result:
----------------
This is the output on Windows:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang>
    <strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script>
    <script src="site.js"></script>
</body>
</html>


Actual result:
--------------
This is the output on Linux:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang><strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script><script src="site.js"></script></body></html>

There is no space between "Logged in as" and "user". The <lang> element is a custom element that my PHP script will resolve into plain HTML.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-06-08 14:41 UTC] nospam at unclassified dot de
It seems Linux always adds the LIBXML_NOBLANKS option whereas Windows does not. I can get the same result on Windows as on Linux when I use this code:

$dom->loadHTML($html, LIBXML_NOBLANKS);

But there is not option that inverts that option. To me it looks like the Linux build of PHP has the bug that the LIBXML_NOBLANKS option is always added while it should not. I couldn't find information about that option because the libxml documentation has no description of this option. It's just empty.
 [2019-11-02 19:20 UTC] nospam at unclassified dot de
Can somebody please have a quick look at this? To me this looks like a simple fix, just edit that one line that sets the option for the Linux build.
 [2020-06-07 15:33 UTC] nospam at unclassified dot de
Is this issue tracker still in use and is anybody here? It feels a little lonely in here.
 [2020-06-08 09:09 UTC] cmb@php.net
-Status: Open +Status: Feedback -Package: *General Issues +Package: DOM XML related -Assigned To: +Assigned To: cmb
 [2020-06-08 09:09 UTC] cmb@php.net
I don't think this related to the OS, but rather to the libxml2
version.  Which one do you use?  Could you try with latest?
 [2020-06-14 17:42 UTC] nospam at unclassified dot de
-Status: Feedback +Status: Assigned
 [2020-06-14 17:42 UTC] nospam at unclassified dot de
I'm not sure which version I have. The server is running Ubuntu 16.04. There are two packages with that name, they have the versions "2.9.3+dfsg1-1ubuntu0.7" and "2.9.3+dfsg1-1". I can't try with another version at the moment. Will try to build up another environment for that with Ubuntu 20.04, I have nothing newer.
 [2020-06-14 20:58 UTC] cmb@php.net
You can check which version is used by PHP with phpinfo().  Ubuntu
20.04 should have libxml 2.9.10 which would be the latest version
where this issue likely has been fixed.
 [2020-06-16 16:10 UTC] cmb@php.net
-Status: Assigned +Status: Open -Assigned To: cmb +Assigned To:
 [2021-01-18 17:50 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2021-01-18 17:50 UTC] cmb@php.net
I still assume this is an upstream issue which has been fixed in
some libxml2 version.  Or can you prove me wrong?
 [2021-01-31 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Sat Apr 17 11:01:23 2021 UTC