php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47209 DOMDocument's inferior parsing of malformed HTML
Submitted: 2009-01-24 15:30 UTC Modified: 2009-01-24 16:09 UTC
From: queen dot zeal at gmail dot com Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.2.8 OS:
Private report: No CVE-ID: None
 [2009-01-24 15:30 UTC] queen dot zeal at gmail dot com
Description:
------------
I'm trying to get a list of all the input elements of a form tag and
am having some difficulty doing so due to a PHP bug.  First, here's my XHTML:

<div>
    <form action="">
    <input type="text" name="a" />
</div>
<div>
    <input type="text" name="b" />
</div>
<div>
    <input type="submit" />
    </form>
</div>

It isn't semantically correct XHTML but that doesn't stop web
developers from coding like that.

Anyway, in both Firefox and IE, if you visit a webpage containing the
above, and hit the Submit button, the resultant URL will have both a
and b defined via GET.

I'd like to be able to get a list of the same input parameters that
the browser does for a given form element.  I had been using "//
form[1]//input" as an XPath query, but that doesn't work, here,
because not all of the inputs are children of the form element.
Indeed, if I use DOMDocument::saveHTML(), I get something more like
this:

<div>
    <form action="">
    <input type="text" name="a" />
    </form>
</div>
<div>
    <input type="text" name="b" />
</div>
<div>
    <input type="submit" />
</div>

Before you go off and pass the buck to the libxml developers, without even reviewing this, consider, first, that it might not be a bug in libxml, but rather, with PHP's bindings to libxml.

Further, if you're going to be pass the buck, do so, yourself.  I don't know C or C++ or whatever language libxml was originally intended to be used with.  Since I don't know C / C++, if I were to file a bug report with the libxml developers, it'd have to be in PHP, which they may or may not know, themselves.  As such, it wouldn't be a very useful bug report, whereas if the person who implemented the libxml bindings for PHP filed it, they could make it a whole lot more useful.

Maybe a good fix for PHP (that wouldn't involve the libxml people) would be to use a different XML parsing engine.  Maybe use the HTML rendering engine that Firefox uses - Gecko.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-24 16:09 UTC] rrichards@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

DOM is meant to handle XML conforming data (not broken HTML). It's not a 
libxml bug either. The HTML load functionality take best guess at trying 
to fix developer tag soup.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 05:01:28 2024 UTC