|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38538 loadHTML doesn't use XHTML namespace
Submitted: 2006-08-21 23:25 UTC Modified: 2006-08-22 09:34 UTC
Avg. Score:4.8 ± 0.4
Reproduced:4 of 4 (100.0%)
Same Version:0 (0.0%)
Same OS:2 (50.0%)
From: spam02 at pornel dot net Assigned:
Status: Wont fix Package: DOM XML related
PHP Version: 6CVS-2006-08-21 (snap) OS: *
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: spam02 at pornel dot net
New email:
PHP Version: OS:


 [2006-08-21 23:25 UTC] spam02 at pornel dot net
From W3C: XHTML/1.0 is a reformulation of HTML 4 in XML. The semantics of HTML and XHTML elements are identical.

loadHTML() should put loaded elements in XHTML namespace to preserve their semantics. These aren't just any random elements - these are HTML elements, and HTML elements in XML (therefore DOM) are in "" namespace.

This isn't purely academic problem. 

It's difficult to handle both HTML and XHTML uniformly using DOM in PHP - difference in namespaces causes xpath/XSLT to behave differently.

AFAIK there's no trivial method of changing namespace of all document elements, so namespace returned by loadHTML() is quite important.

Simply putting elements in a namespace will break backwards-compatibility a little (xpath queries for example). Therefore I suggest adding optional boolean argument to loadHTML() and loadHTMLFile() that enables new behavior.

Reproduce code:
$html = new DOMDocument(); $html->loadHTML('<html><body>hello');
$xhtml = new DOMDocument(); $xhtml->loadXML('<html xmlns=""><body>hello</body></html>');

function test($doc)
$x = new DOMXPath($doc);
echo $x->evaluate("string(//x:body)");


// local-name() could be used as workaround in this practicular text-case, however this isn't possible/feasible in every case.

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-08-22 05:41 UTC]
loadHTML only properly deals with HTML(4) documents (which are 
by definition not namespace aware and therefore discards 

If you want to keep the namespaces, use loadXML() or, for your 
proposal, use the tidy extension to make XHTML out of your 
HTML documents.

 [2006-08-22 09:34 UTC] spam02 at pornel dot net
I'm not saying that loadHTML should read namespace from input - ofcourse HTML/SGML syntax doesn't support it. 

But by loading HTML into XML DOM you're basically converting it to namespace aware representation. Being HTML is implied by source format, and not explictly stated in the document, so namespace information needs to be added.

Namespace is used to distinguish incompatible nodes in DOM, however DOM representation of XHTML and HTML is 100% compatible (same semantics, structure).

Tidy nodes aren't compatible with PHP DOM extension, so this is not a solution.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Mar 04 15:01:31 2024 UTC