php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49984 DOM model is completely broken
Submitted: 2009-10-24 04:27 UTC Modified: 2009-11-04 09:42 UTC
Votes:3
Avg. Score:5.0 ± 0.0
Reproduced:3 of 3 (100.0%)
Same Version:3 (100.0%)
Same OS:3 (100.0%)
From: ppass at hotmail dot fr Assigned:
Status: Not a bug Package: DOM XML related
PHP Version: 5.2.11 OS: Linux ns1 2.6.28.4-rsbac
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: ppass at hotmail dot fr
New email:
PHP Version: OS:

 

 [2009-10-24 04:27 UTC] ppass at hotmail dot fr
Description:
------------
The script node's parent is a div.
The script node has the text '</div>' inside its script.

The DOM node returns only partial contents of the script node, as if the node was mistakenly truncated when reaching the '</div>' text.

Reproduce code:
---------------
    $html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"><title>Title</title></head><body><div><script type="text/javascript" id="script1">function dummy { object.innerHTML="<div>text</div>"; } function dummy2 { alert("hello"); } </script> </div> </body> </html>';
 
    $dom = new DOMDocument('1.0', 'UTF-8');
    @$dom->loadHTML($html);

    $script_node = $dom->getElementById('script1');
    Echo  "<![CDATA[$script_node->nodeValue]]>"; 


Expected result:
----------------
function dummy { object.innerHTML="<div>text</div>"; } function dummy2 { alert("hello"); } 

I expect to see the whole content of the script node.

Actual result:
--------------
function dummy { object.innerHTML="<div>text

The script node has been truncated.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-11-02 05:42 UTC] ppass at hotmail dot fr
No reaction still to this bug. Maybe my previous title was too specific. More generally speaking, it means that the DOM model is broken in php when ever a script tag contains other tags in its text.

This is a serious bug that must be corrected asap, other wise it is not possible to make a reliable use of DOM.
 [2009-11-02 06:53 UTC] rasmus@php.net
We didn't write the DOM implementation.  We are simply using libxml2.  Information on how to file a bug against libxml2 is here: http://xmlsoft.org/bugs.html

But I suspect they won't consider this a bug.  Their relaxed html parser isn't a full html parser that knows about embedded script objects.

This would only be a PHP bug if we are somehow calling libxml2 incorrectly causing this, but it doesn't appear to be the case here.
 [2009-11-02 13:46 UTC] ppass at hotmail dot fr
That you for details, I just filed a bug in their system.
 [2009-11-02 17:45 UTC] ppass at hotmail dot fr
The reply from the libxml2 team is to try to add the HTML_PARSE_RECOVER option when creating the
parsing context.

I have no idea what that means. Does anybody know how this can be done from PHP code?
 [2009-11-04 09:42 UTC] ppass at hotmail dot fr
This is still an open topic for me since there seems no easy way to implement in PHP their suggestion (adding the HTML_PARSE_RECOVER
option when creating the parsing context).

Is this something that can be done in PHP and how?
Please advise, otherwise the subject remains open.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 21:01:27 2024 UTC