php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #16952 Bug when parsing XML file using XML_Tree
Submitted: 2002-05-01 16:57 UTC Modified: 2002-05-17 15:58 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: jmcastagnetto@php.net Assigned: cox (profile)
Status: Closed Package: PEAR related
PHP Version: 4.2.0 OS: RH Linux 6.1
Private report: No CVE-ID: None
 [2002-05-01 16:57 UTC] jmcastagnetto@php.net
Was testing XML_Tree parsing some files from the PHPDOC directory (the PHP Documentation in cvs.php.net/cvs.php/phpdoc), and found out a really nasty bug that makes this class less than useful.



It seems like XML_Tree (or XML_Parser, I had not isolated the problem), just misses content when parsing the files. I used the following code:



=== testTree.php ===



require_once "XML/Tree.php";



// change to wherever you have checked out phpdoc

$phpdoc = "/home/jesus/devel/php/phpdoc/";

$entities = "entities/global.ent";

$file = "en/language/control-structures.xml";



$xmlarr = file($phpdoc.$file);

$entarr = file($phpdoc.$entities);



$doctype = '<!DOCTYPE chapter PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4/EN"

				"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd" [

'.implode("",$entarr)."\n]>\n";

$xmltop = $xmlarr[0];

$xmlbody = implode("",array_slice($xmlarr,1));



$xmldoc = $xmltop.$doctype.$xmlbody;

//echo $xmldoc;

$tree = new XML_Tree();

$root = $tree->getTreeFromString($xmldoc);

//$root->dump();

print_r($root);



=== end ===



And when I run it, I get bits like (when commenting out $root->dump()):



=== $root->dump() ===

...

<sect1 id="function.include-once">

<title>

<function>

   include_once</function>

</title>

<para>

<function>

    The include_once</function>

<function>include</function>

</para>

...

=== end ===



When echoing $xmldoc above, that same section looks like:



=== echo $xmldoc; ===

...

 <sect1 id="function.include-once">

   <title><function>include_once</function></title>

   <para>

    The <function>include_once</function> statement includes and evaluates

    the specified file during the execution of the script.

    This is a behavior similar to the <function>include</function> statement,

    with the only difference being that if the code from a file has already

    been included, it will not be included again.  As the name suggests, 

    it will be included just once.

   </para>

...

=== end ===



Clearly something went terribly wrong while parsing or constructing the tree. Adding the code below assured me that the xml parsing lib (expat) was not the one messing up:



=== extra code ===



$xml = xml_parser_create();

xml_parse_into_struct($xml, $xmldoc, &$vals, &$index);

xml_parser_free($xml);

print_r($vals);



=== end ===



That bit of code got the content OK. I think that a thorough revision of XML_Tree and/or XML_Parser is warranted.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-05-01 17:01 UTC] jmcastagnetto@php.net
BTW, I am using the latest version of XML_Tree from the CVS tree (pear/XML_Tree), and the XML_Parser included w/ PHP 4.2.0
 [2002-05-07 12:46 UTC] cox@php.net
The current implementation I did does not support mixed XML contents (#PCDATA + elements). If someone could tell me how DOM represents that, I could try to fix this issue.

This will be enought:
<?php
$xmlstr = "
<root><child>
This is a <tag bar="a">foo</tag> mixed example
</child></root>";
print_r(domxml_xmltree($xmlstr));
?>

Thanks,

Tomas V.V.Cox
 [2002-05-13 22:41 UTC] jmcastagnetto@php.net
I am not sure about how libdomxml does the transversal, but here is the spec from W3C for transversal:

http://www.w3.org/TR/2000/REC-DOM-Level-2-Traversal-Range-20001113/traversal.html


 [2002-05-17 08:20 UTC] cox@php.net
Ok, after some research time and a headache, I commited a fix for that. Please try it and tell me if the support is now correct.

-- Tomas V.V.Cox
 [2002-05-17 15:58 UTC] jmcastagnetto@php.net
The class now works as expected, and it dos not choke on <![CDATA[]]>, although it loses it when dump()'ing to a string, but that is a minor thing. 

Good work Tomas!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Dec 22 06:01:30 2024 UTC