php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #66712 Using LIBXML_HTML_NOIMPLIED on DomDocument::loadHTML() gives unexpected results
Submitted: 2014-02-14 01:23 UTC Modified: 2015-04-15 14:46 UTC
Votes:7
Avg. Score:4.1 ± 1.0
Reproduced:6 of 6 (100.0%)
Same Version:1 (16.7%)
Same OS:0 (0.0%)
From: chanson at mesd dot k12 dot or dot us Assigned:
Status: Verified Package: DOM XML related
PHP Version: 5.5.9 OS: Fedora 20 x86/64
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: chanson at mesd dot k12 dot or dot us
New email:
PHP Version: OS:

 

 [2014-02-14 01:23 UTC] chanson at mesd dot k12 dot or dot us
Description:
------------
Using the LIBXML_HTML_NOIMPLIED predefined constant in the DomDocument class has unexpected results.

The nodeValue of any first DOMNodeList item always contains all the values of every node list item in the collection only when the optional LIBXML_HTML_NOIMPLIED predefined constant is passed to the loadHTML() method.

I am currently running:
- PHP Version 5.5.8
- libxml Version 2.9.1

Test script:
---------------
$html = '<h1>Foo</h1><h2>Bar</h2><p>lorem ipsum</p>';
$dom = new \DomDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED);
$nodes = $dom->getElementsByTagName('*');
echo $nodes->item(0)->tagName . ' -> ' . $nodes->item(0)->nodeValue . '<br/>';
foreach ($nodes as $node) {
    echo $node->tagName . ' -> ' . $node->nodeValue . '<br/>';
}

Expected result:
----------------
h1 -> FooBarlorem ipsum
h1 -> FooBarlorem ipsum
h2 -> Bar
p -> lorem ipsum

Actual result:
--------------
h1 -> Foo
h1 -> Foo
h2 -> Bar
p -> lorem ipsum

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-04-15 14:41 UTC] cmb@php.net
-Summary: Using LIBXML_NOHTML_IMPLIED on DomDocument::loadHTML() gives unexpected results +Summary: Using LIBXML_HTML_NOIMPLIED on DomDocument::loadHTML() gives unexpected results -Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2015-04-15 14:41 UTC] cmb@php.net
I am not able to reproduce this behavior, see
<http://3v4l.org/SFlNo>. Can you confirm?
 [2015-04-15 14:46 UTC] cmb@php.net
-Status: Feedback +Status: Verified -Assigned To: cmb +Assigned To:
 [2015-04-15 14:46 UTC] cmb@php.net
Well, of course the behavior is reproducible -- only the expected
and actual behavior sections in the report are mixed up.
 [2015-11-09 14:47 UTC] dlundgren at syberisle dot net
This may be more of a documentation issue than a code issue. After encountering the same problem recently, I found that the first element becomes the root element of the document. To offset this I wrapped the html fragment in a root element, and I was able to work with it that way.

This is most likely due to our lack of understanding that the DOM Level 2 requires a document to have a single documentElement, with children under it. I had to look that up to understand what I was doing wrong.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Wed Jan 29 05:01:24 2020 UTC