|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #66712 Using LIBXML_HTML_NOIMPLIED on DomDocument::loadHTML() gives unexpected results
Submitted: 2014-02-14 01:23 UTC Modified: 2023-09-07 18:46 UTC
Avg. Score:4.1 ± 0.9
Reproduced:7 of 7 (100.0%)
Same Version:1 (14.3%)
Same OS:1 (14.3%)
From: chanson at mesd dot k12 dot or dot us Assigned: nielsdos (profile)
Status: Closed Package: DOM XML related
PHP Version: 5.5.9 OS: Fedora 20 x86/64
Private report: No CVE-ID: None
 [2014-02-14 01:23 UTC] chanson at mesd dot k12 dot or dot us
Using the LIBXML_HTML_NOIMPLIED predefined constant in the DomDocument class has unexpected results.

The nodeValue of any first DOMNodeList item always contains all the values of every node list item in the collection only when the optional LIBXML_HTML_NOIMPLIED predefined constant is passed to the loadHTML() method.

I am currently running:
- PHP Version 5.5.8
- libxml Version 2.9.1

Test script:
$html = '<h1>Foo</h1><h2>Bar</h2><p>lorem ipsum</p>';
$dom = new \DomDocument();
$nodes = $dom->getElementsByTagName('*');
echo $nodes->item(0)->tagName . ' -> ' . $nodes->item(0)->nodeValue . '<br/>';
foreach ($nodes as $node) {
    echo $node->tagName . ' -> ' . $node->nodeValue . '<br/>';

Expected result:
h1 -> FooBarlorem ipsum
h1 -> FooBarlorem ipsum
h2 -> Bar
p -> lorem ipsum

Actual result:
h1 -> Foo
h1 -> Foo
h2 -> Bar
p -> lorem ipsum


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2015-04-15 14:41 UTC]
-Summary: Using LIBXML_NOHTML_IMPLIED on DomDocument::loadHTML() gives unexpected results +Summary: Using LIBXML_HTML_NOIMPLIED on DomDocument::loadHTML() gives unexpected results -Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2015-04-15 14:41 UTC]
I am not able to reproduce this behavior, see
<>. Can you confirm?
 [2015-04-15 14:46 UTC]
-Status: Feedback +Status: Verified -Assigned To: cmb +Assigned To:
 [2015-04-15 14:46 UTC]
Well, of course the behavior is reproducible -- only the expected
and actual behavior sections in the report are mixed up.
 [2015-11-09 14:47 UTC] dlundgren at syberisle dot net
This may be more of a documentation issue than a code issue. After encountering the same problem recently, I found that the first element becomes the root element of the document. To offset this I wrapped the html fragment in a root element, and I was able to work with it that way.

This is most likely due to our lack of understanding that the DOM Level 2 requires a document to have a single documentElement, with children under it. I had to look that up to understand what I was doing wrong.
 [2023-06-08 20:45 UTC]
-Type: Bug +Type: Documentation Problem
 [2023-06-08 20:45 UTC]
Agreed that this is a documentation issue. The document element indeed becomes the <h1> tag and everything becomes nested under it: <h1>Foo<h2>Bar</h2><p>lorem ipsum</p></h1>
I think this is correct behaviour.
 [2023-09-07 18:45 UTC]
I contacted the libxml2 developer, and he confirmed it *is* actually a bug and has fixed the bug:
The next release of libxml2 will have the right behaviour, i.e. no nesting inside h1.
 [2023-09-07 18:46 UTC]
-Status: Verified +Status: Closed -Type: Documentation Problem +Type: Bug -Assigned To: +Assigned To: nielsdos
 [2023-09-07 18:46 UTC]
The fix for this bug has been committed.
If you are still experiencing this bug, try to check out latest source from and re-test.
Thank you for the report, and for helping us make PHP better.

Fixed in libxml2.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Jul 13 08:01:29 2024 UTC