php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60021 DOMDocument errors on HTML5 tags
Submitted: 2011-10-09 05:24 UTC Modified: 2012-04-02 04:13 UTC
Votes:56
Avg. Score:4.7 ± 0.6
Reproduced:52 of 52 (100.0%)
Same Version:18 (34.6%)
Same OS:17 (32.7%)
From: drgroove at gmail dot com Assigned:
Status: Suspended Package: DOM XML related
PHP Version: 5.3.8 OS: Mac OS X
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: drgroove at gmail dot com
New email:
PHP Version: OS:

 

 [2011-10-09 05:24 UTC] drgroove at gmail dot com
Description:
------------
Loading HTML documents through DOMDocument->loadHTMLFile(), when the HTML file contains certain new HTML5 tags, results in this error: 

Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Tag footer invalid in {file path here}

<footer> is a new HTML5 tag.  The error appears for other HTML5 tags as well (eg, <header>). 



Test script:
---------------
// TEST.html
<header>
     Some text here
</header>

// TEST.php
<?php
$dom_document 	= new DOMDocument(); 
$dom_document->loadHTMLFile("TEST.html");
?>



Expected result:
----------------
DOMDocument should not fail on HTML5 tags. 

Actual result:
--------------
Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Tag footer invalid in {file path here}


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-04-02 03:41 UTC] drgroove at gmail dot com
Any progress on resolving this?  Working w/ DOMDocument and HTML5 is a huge pain in the butt right now; you have to write custom error handlers for things like <header/>, <nav/>, and other HTML5 tags.  

Also, just entered a bug report for SimpleXML (where tags w/ both attributes and text have their attributes dropped).  Both DOMDocument and SimpleXML need updates... it's very difficult to work w/ HTML and XML when both of these APIs have so many issues. 

Thanks for your help everyone :)
 [2012-04-02 04:13 UTC] aharvey@php.net
-Status: Open +Status: Suspended
 [2012-04-02 04:13 UTC] aharvey@php.net
It's a valid issue, but it's really an upstream one: libxml2's HTML parser only 
supports HTML 4.01, so until that's extended to support HTML5 or a new parser is 
added to libxml2, there's little to be done in PHP proper.

There are userspace parsers available: html5lib will parse documents according 
to the HTML5 algorithm and give you a DOMDocument to work with.

Suspending for now. Given that the issue was first raised upstream in 2008, I 
wouldn't hold your breath (although I suspect they'd love a patch).
 [2016-02-04 07:45 UTC] cweiske@php.net
There wasn't even a HTML5 tag bug report for libxml2 yet; I've created it now:
 https://bugzilla.gnome.org/show_bug.cgi?id=761534
 [2020-01-31 13:56 UTC] matthewheroux at gmail dot com
HTML5 has been the standard for years. This is negatively impacting PHP. Huge issue with keeping PHP relevant and modern. It seems like such as achievable fix. A few entities, a different doc tag.
 [2021-07-21 00:48 UTC] apostnikov at gmail dot com
Still getting error on PHP 8.1 beta1

> Warning: DOMDocument::loadHTMLFile(): Tag header invalid in 1.html
 [2021-07-21 13:24 UTC] apostnikov at gmail dot com
Related issue https://gitlab.gnome.org/GNOME/libxml2/-/issues/211
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 04 19:01:32 2024 UTC