|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38538 loadHTML doesn't use XHTML namespace
Submitted: 2006-08-21 23:25 UTC Modified: 2006-08-22 09:34 UTC
Avg. Score:4.8 ± 0.4
Reproduced:4 of 4 (100.0%)
Same Version:0 (0.0%)
Same OS:2 (50.0%)
From: spam02 at pornel dot net Assigned:
Status: Wont fix Package: DOM XML related
PHP Version: 6CVS-2006-08-21 (snap) OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: spam02 at pornel dot net
New email:
PHP Version: OS:


 [2006-08-21 23:25 UTC] spam02 at pornel dot net
From W3C: XHTML/1.0 is a reformulation of HTML 4 in XML. The semantics of HTML and XHTML elements are identical.

loadHTML() should put loaded elements in XHTML namespace to preserve their semantics. These aren't just any random elements - these are HTML elements, and HTML elements in XML (therefore DOM) are in "" namespace.

This isn't purely academic problem. 

It's difficult to handle both HTML and XHTML uniformly using DOM in PHP - difference in namespaces causes xpath/XSLT to behave differently.

AFAIK there's no trivial method of changing namespace of all document elements, so namespace returned by loadHTML() is quite important.

Simply putting elements in a namespace will break backwards-compatibility a little (xpath queries for example). Therefore I suggest adding optional boolean argument to loadHTML() and loadHTMLFile() that enables new behavior.

Reproduce code:
$html = new DOMDocument(); $html->loadHTML('<html><body>hello');
$xhtml = new DOMDocument(); $xhtml->loadXML('<html xmlns=""><body>hello</body></html>');

function test($doc)
$x = new DOMXPath($doc);
echo $x->evaluate("string(//x:body)");


// local-name() could be used as workaround in this practicular text-case, however this isn't possible/feasible in every case.

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-08-22 05:41 UTC]
loadHTML only properly deals with HTML(4) documents (which are 
by definition not namespace aware and therefore discards 

If you want to keep the namespaces, use loadXML() or, for your 
proposal, use the tidy extension to make XHTML out of your 
HTML documents.

 [2006-08-22 09:34 UTC] spam02 at pornel dot net
I'm not saying that loadHTML should read namespace from input - ofcourse HTML/SGML syntax doesn't support it. 

But by loading HTML into XML DOM you're basically converting it to namespace aware representation. Being HTML is implied by source format, and not explictly stated in the document, so namespace information needs to be added.

Namespace is used to distinguish incompatible nodes in DOM, however DOM representation of XHTML and HTML is 100% compatible (same semantics, structure).

Tidy nodes aren't compatible with PHP DOM extension, so this is not a solution.
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Wed Oct 27 09:03:33 2021 UTC