php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60021 DOMDocument errors on HTML5 tags
Submitted: 2011-10-09 05:24 UTC Modified: 2012-04-02 04:13 UTC
Votes:48
Avg. Score:4.7 ± 0.7
Reproduced:44 of 44 (100.0%)
Same Version:16 (36.4%)
Same OS:14 (31.8%)
From: drgroove at gmail dot com Assigned:
Status: Suspended Package: DOM XML related
PHP Version: 5.3.8 OS: Mac OS X
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: drgroove at gmail dot com
New email:
PHP Version: OS:

 

 [2011-10-09 05:24 UTC] drgroove at gmail dot com
Description:
------------
Loading HTML documents through DOMDocument->loadHTMLFile(), when the HTML file contains certain new HTML5 tags, results in this error: 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Tag footer invalid in {file path here}

<footer> is a new HTML5 tag.  The error appears for other HTML5 tags as well (eg, <header>). 



Test script:
---------------
// TEST.html
<header>
     Some text here
</header>

// TEST.php
<?php
$dom_document 	= new DOMDocument(); 
$dom_document->loadHTMLFile("TEST.html");
?>



Expected result:
----------------
DOMDocument should not fail on HTML5 tags. 

Actual result:
--------------
Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Tag footer invalid in {file path here}


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-04-02 03:41 UTC] drgroove at gmail dot com
Any progress on resolving this?  Working w/ DOMDocument and HTML5 is a huge pain in the butt right now; you have to write custom error handlers for things like <header/>, <nav/>, and other HTML5 tags.  

Also, just entered a bug report for SimpleXML (where tags w/ both attributes and text have their attributes dropped).  Both DOMDocument and SimpleXML need updates... it's very difficult to work w/ HTML and XML when both of these APIs have so many issues. 

Thanks for your help everyone :)
 [2012-04-02 04:13 UTC] aharvey@php.net
-Status: Open +Status: Suspended
 [2012-04-02 04:13 UTC] aharvey@php.net
It's a valid issue, but it's really an upstream one: libxml2's HTML parser only 
supports HTML 4.01, so until that's extended to support HTML5 or a new parser is 
added to libxml2, there's little to be done in PHP proper.

There are userspace parsers available: html5lib will parse documents according 
to the HTML5 algorithm and give you a DOMDocument to work with.

Suspending for now. Given that the issue was first raised upstream in 2008, I 
wouldn't hold your breath (although I suspect they'd love a patch).
 [2016-02-04 07:45 UTC] cweiske@php.net
There wasn't even a HTML5 tag bug report for libxml2 yet; I've created it now:
 https://bugzilla.gnome.org/show_bug.cgi?id=761534
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Wed Sep 18 11:01:30 2019 UTC