php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47209 DOMDocument's inferior parsing of malformed HTML
Submitted: 2009-01-24 15:30 UTC Modified: 2009-01-24 16:09 UTC
From: queen dot zeal at gmail dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.2.8 OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: queen dot zeal at gmail dot com
New email:
PHP Version: OS:

 

 [2009-01-24 15:30 UTC] queen dot zeal at gmail dot com
Description:
------------
I'm trying to get a list of all the input elements of a form tag and
am having some difficulty doing so due to a PHP bug.  First, here's my XHTML:

<div>
    <form action="">
    <input type="text" name="a" />
</div>
<div>
    <input type="text" name="b" />
</div>
<div>
    <input type="submit" />
    </form>
</div>

It isn't semantically correct XHTML but that doesn't stop web
developers from coding like that.

Anyway, in both Firefox and IE, if you visit a webpage containing the
above, and hit the Submit button, the resultant URL will have both a
and b defined via GET.

I'd like to be able to get a list of the same input parameters that
the browser does for a given form element.  I had been using "//
form[1]//input" as an XPath query, but that doesn't work, here,
because not all of the inputs are children of the form element.
Indeed, if I use DOMDocument::saveHTML(), I get something more like
this:

<div>
    <form action="">
    <input type="text" name="a" />
    </form>
</div>
<div>
    <input type="text" name="b" />
</div>
<div>
    <input type="submit" />
</div>

Before you go off and pass the buck to the libxml developers, without even reviewing this, consider, first, that it might not be a bug in libxml, but rather, with PHP's bindings to libxml.

Further, if you're going to be pass the buck, do so, yourself.  I don't know C or C++ or whatever language libxml was originally intended to be used with.  Since I don't know C / C++, if I were to file a bug report with the libxml developers, it'd have to be in PHP, which they may or may not know, themselves.  As such, it wouldn't be a very useful bug report, whereas if the person who implemented the libxml bindings for PHP filed it, they could make it a whole lot more useful.

Maybe a good fix for PHP (that wouldn't involve the libxml people) would be to use a different XML parsing engine.  Maybe use the HTML rendering engine that Firefox uses - Gecko.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-24 16:09 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

DOM is meant to handle XML conforming data (not broken HTML). It's not a 
libxml bug either. The HTML load functionality take best guess at trying 
to fix developer tag soup.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Mar 14 09:01:29 2025 UTC