php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #78129 DOMDocument removes whitespace between nodes
Submitted: 2019-06-08 14:31 UTC Modified: 2021-01-31 04:22 UTC
From: nospam at unclassified dot de Assigned: cmb (profile)
Status: No Feedback Package: DOM XML related
PHP Version: 7.3.6 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: nospam at unclassified dot de
New email:
PHP Version: OS:

 

 [2019-06-08 14:31 UTC] nospam at unclassified dot de
Description:
------------
The DOMDocument removes whitespace between HTML elements when loading and saving HTML content. This leads to wrong display of content. The bug only affects Linux, it works correctly in Windows (same version).

Test script:
---------------
<?php
$html = '<html>
<body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang>
    <strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script>
    <script src="site.js"></script>
</body>
</html>';

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);

header('Content-Type: text/plain');
echo $dom->saveHTML();


Expected result:
----------------
This is the output on Windows:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang>
    <strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script>
    <script src="site.js"></script>
</body>
</html>


Actual result:
--------------
This is the output on Linux:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
    <p>
        Text
    </p>
    <lang>Logged in as</lang><strong>user</strong>
    <script src="jquery-3.4.1.min.js"></script><script src="site.js"></script></body></html>

There is no space between "Logged in as" and "user". The <lang> element is a custom element that my PHP script will resolve into plain HTML.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-06-08 14:41 UTC] nospam at unclassified dot de
It seems Linux always adds the LIBXML_NOBLANKS option whereas Windows does not. I can get the same result on Windows as on Linux when I use this code:

$dom->loadHTML($html, LIBXML_NOBLANKS);

But there is not option that inverts that option. To me it looks like the Linux build of PHP has the bug that the LIBXML_NOBLANKS option is always added while it should not. I couldn't find information about that option because the libxml documentation has no description of this option. It's just empty.
 [2019-11-02 19:20 UTC] nospam at unclassified dot de
Can somebody please have a quick look at this? To me this looks like a simple fix, just edit that one line that sets the option for the Linux build.
 [2020-06-07 15:33 UTC] nospam at unclassified dot de
Is this issue tracker still in use and is anybody here? It feels a little lonely in here.
 [2020-06-08 09:09 UTC] cmb@php.net
-Status: Open +Status: Feedback -Package: *General Issues +Package: DOM XML related -Assigned To: +Assigned To: cmb
 [2020-06-08 09:09 UTC] cmb@php.net
I don't think this related to the OS, but rather to the libxml2
version.  Which one do you use?  Could you try with latest?
 [2020-06-14 17:42 UTC] nospam at unclassified dot de
-Status: Feedback +Status: Assigned
 [2020-06-14 17:42 UTC] nospam at unclassified dot de
I'm not sure which version I have. The server is running Ubuntu 16.04. There are two packages with that name, they have the versions "2.9.3+dfsg1-1ubuntu0.7" and "2.9.3+dfsg1-1". I can't try with another version at the moment. Will try to build up another environment for that with Ubuntu 20.04, I have nothing newer.
 [2020-06-14 20:58 UTC] cmb@php.net
You can check which version is used by PHP with phpinfo().  Ubuntu
20.04 should have libxml 2.9.10 which would be the latest version
where this issue likely has been fixed.
 [2020-06-16 16:10 UTC] cmb@php.net
-Status: Assigned +Status: Open -Assigned To: cmb +Assigned To:
 [2021-01-18 17:50 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2021-01-18 17:50 UTC] cmb@php.net
I still assume this is an upstream issue which has been fixed in
some libxml2 version.  Or can you prove me wrong?
 [2021-01-31 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Sep 19 23:01:26 2024 UTC