php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77651 DOMDocument preserves insignificant whitespace text nodes
Submitted: 2019-02-22 00:26 UTC Modified: 2020-10-27 12:57 UTC
Votes:2
Avg. Score:3.5 ± 0.5
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: morozov at tut dot by Assigned: cmb (profile)
Status: Not a bug Package: DOM XML related
PHP Version: 7.3.2 OS: Windows
Private report: No CVE-ID: None
 [2019-02-22 00:26 UTC] morozov at tut dot by
Description:
------------
DOMDocument::loadHTML() produces different results on Windows and Linux. Whitespace text nodes are ignored between non-inline elements on Linux but not on Windows.

The different behaviors are observed on the same PHP version (7.3.2) and very close libxml versions (2.9.3 on Linux and 2.9.8 on Windows).

Test script:
---------------
$doc = new DOMDocument();
$doc->loadHTML(
    <<<HTML
<html>
    <body>
        <map>
           <area />
        </map>
        <div>
           <span />
        </div>
    </body>
</html>
HTML
);

var_dump(
    $doc->getElementsByTagName('map')
        ->item(0)
        ->childNodes
        ->length
);

var_dump(
    $doc->getElementsByTagName('div')
        ->item(0)
        ->childNodes
        ->length
);

Expected result:
----------------
int(1)
int(3)

Actual result:
--------------
int(3)
int(3)

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-10-27 12:57 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2020-10-27 12:57 UTC] cmb@php.net
Windows vs. Linux is irrelevant here; the behavior is different
for different versions of libxml2; libxml2 2.9.10 on Linux also
reports

    int(3)
    int(3)

So this is an upstream issue, but not a PHP bug.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon Nov 23 17:01:23 2020 UTC