php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #16952 Bug when parsing XML file using XML_Tree
Submitted: 2002-05-01 16:57 UTC Modified: 2002-05-17 15:58 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: jmcastagnetto@php.net Assigned: cox (profile)
Status: Closed Package: PEAR related
PHP Version: 4.2.0 OS: RH Linux 6.1
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jmcastagnetto@php.net
New email:
PHP Version: OS:

 

 [2002-05-01 16:57 UTC] jmcastagnetto@php.net
Was testing XML_Tree parsing some files from the PHPDOC directory (the PHP Documentation in cvs.php.net/cvs.php/phpdoc), and found out a really nasty bug that makes this class less than useful.



It seems like XML_Tree (or XML_Parser, I had not isolated the problem), just misses content when parsing the files. I used the following code:



=== testTree.php ===



require_once "XML/Tree.php";



// change to wherever you have checked out phpdoc

$phpdoc = "/home/jesus/devel/php/phpdoc/";

$entities = "entities/global.ent";

$file = "en/language/control-structures.xml";



$xmlarr = file($phpdoc.$file);

$entarr = file($phpdoc.$entities);



$doctype = '<!DOCTYPE chapter PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4/EN"

				"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd" [

'.implode("",$entarr)."\n]>\n";

$xmltop = $xmlarr[0];

$xmlbody = implode("",array_slice($xmlarr,1));



$xmldoc = $xmltop.$doctype.$xmlbody;

//echo $xmldoc;

$tree = new XML_Tree();

$root = $tree->getTreeFromString($xmldoc);

//$root->dump();

print_r($root);



=== end ===



And when I run it, I get bits like (when commenting out $root->dump()):



=== $root->dump() ===

...

<sect1 id="function.include-once">

<title>

<function>

   include_once</function>

</title>

<para>

<function>

    The include_once</function>

<function>include</function>

</para>

...

=== end ===



When echoing $xmldoc above, that same section looks like:



=== echo $xmldoc; ===

...

 <sect1 id="function.include-once">

   <title><function>include_once</function></title>

   <para>

    The <function>include_once</function> statement includes and evaluates

    the specified file during the execution of the script.

    This is a behavior similar to the <function>include</function> statement,

    with the only difference being that if the code from a file has already

    been included, it will not be included again.  As the name suggests, 

    it will be included just once.

   </para>

...

=== end ===



Clearly something went terribly wrong while parsing or constructing the tree. Adding the code below assured me that the xml parsing lib (expat) was not the one messing up:



=== extra code ===



$xml = xml_parser_create();

xml_parse_into_struct($xml, $xmldoc, &$vals, &$index);

xml_parser_free($xml);

print_r($vals);



=== end ===



That bit of code got the content OK. I think that a thorough revision of XML_Tree and/or XML_Parser is warranted.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-05-01 17:01 UTC] jmcastagnetto@php.net
BTW, I am using the latest version of XML_Tree from the CVS tree (pear/XML_Tree), and the XML_Parser included w/ PHP 4.2.0
 [2002-05-07 12:46 UTC] cox@php.net
The current implementation I did does not support mixed XML contents (#PCDATA + elements). If someone could tell me how DOM represents that, I could try to fix this issue.

This will be enought:
<?php
$xmlstr = "
<root><child>
This is a <tag bar="a">foo</tag> mixed example
</child></root>";
print_r(domxml_xmltree($xmlstr));
?>

Thanks,

Tomas V.V.Cox
 [2002-05-13 22:41 UTC] jmcastagnetto@php.net
I am not sure about how libdomxml does the transversal, but here is the spec from W3C for transversal:

http://www.w3.org/TR/2000/REC-DOM-Level-2-Traversal-Range-20001113/traversal.html


 [2002-05-17 08:20 UTC] cox@php.net
Ok, after some research time and a headache, I commited a fix for that. Please try it and tell me if the support is now correct.

-- Tomas V.V.Cox
 [2002-05-17 15:58 UTC] jmcastagnetto@php.net
The class now works as expected, and it dos not choke on <![CDATA[]]>, although it loses it when dump()'ing to a string, but that is a minor thing. 

Good work Tomas!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 11:01:30 2024 UTC