php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38930 xml_parser_to_struct hoses html content
Submitted: 2006-09-22 20:55 UTC Modified: 2006-10-11 01:00 UTC
From: pd at xobase dot com Assigned:
Status: No Feedback Package: *XML functions
PHP Version: 5.1.6 OS: All
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: pd at xobase dot com
New email:
PHP Version: OS:

 

 [2006-09-22 20:55 UTC] pd at xobase dot com
Description:
------------
This relates the XML/php content management system XObase.

XOb allows users to embed html content into the xml docs used to store data in the system.  The html is wrapped in a CDATA wrapper, and it has been escaped.  This all worked perfectly in versions of php prior to 5.1.x (5.0.x is ok).

What happens is that even if the file is saved correctly, and well formed, in some instances the html content in the file causes a badly formed xml object to be created.  The system uses xml_parser_to_struct from a string read from disk.

We were using implode(file($filearg),"");

In the newer php versions, the array returned from file() is not complete.  So there may be an issue with the file() function. (specifically if a call back to it used within xml_parse_into_struct)

We changed this to 
$fp = fopen($filearg,"r");
$data = fread($fp,filesize($filearg));
fclose($fp);

$data = file_get_contents($filearg); Also returns an incomplete string from xml doc.

The fread() method creates a good string, however.
The $xmlObj created below in the code example cuts off at the node where the html exists.  This seems to happen with certain html every time.  In other instances not at all, or if we paste the same content that is good once more than 1 or 2 times the issue shows.

I've pasted an example xml string that contains the issue on opening.
Note that the offending html lives in node

<BRANDINGHTML_01>





Reproduce code:
---------------
function createXMLStruc($filearg){
	if (is_file($filearg)){
		$fp = fopen($filearg,"r");
		$data = fread($fp,filesize($filearg));
		fclose($fp);
		//$data = file_get_contents($filearg);
		//$data = implode(file($filearg),"");
		$p = xml_parser_create();
		xml_parser_set_option($p, XML_OPTION_SKIP_WHITE, 1);
		//xml_parser_set_option($p, XML_OPTION_CASE_FOLDING, false);
		xml_parse_into_struct($p,$data,&$structure,&$index);
		xml_parser_free($p);
		$xmlObj[0]=$index;
		$xmlObj[1]=$structure;
		return $xmlObj;
	}else{
		//echo "Array createXMLstruc() function error :$filearg does not exist <br>";
	}
}

>>>>>>>>>>>>> Example doc
<?xml version="1.0"?>
<ARRAY_CLASS CLASS="content" TYPE="venue" STATUS="pending" ISCONTAINER="false" PARENTCLASS="venue.xml">
  <GENERAL TYPE="property-catagory" LABEL="General" DATAFORMAT="">
    <NAME TYPE="text" LABEL="Name" DATAFORMAT="singleline" SEARCH="true"><![CDATA[TestVenue]]></NAME>
    <ALIAS1 TYPE="alias" LABEL="Class Attributes" PROPERTY="ARRAY_CLASS"></ALIAS1>
    <RELEASE TYPE="date" LABEL="Release Date" DATAFORMAT="MTYHMS" SEARCH="false"></RELEASE>
    <EXPIRE TYPE="date" LABEL="Expire Date" DATAFORMAT="MTYHMS" SEARCH="false"></EXPIRE>
    <CLASS_TEMPLATES TYPE="classtemplates" DATAFORMAT="path" SEARCH="false" LABEL="Templates" SELECTED="0"></CLASS_TEMPLATES>
    <CREATIONDATE LABEL="Creation Date" TYPE="date" DATAFORMAT="mdyhms">1158958165</CREATIONDATE>
    <OWNER TYPE="resourceproperty" LABEL="Owner" PROPERTY="EMAIL" WHERE="users" SEARCH="false">/pdempsey.xml</OWNER>
    <VENUECODE TYPE="text" LABEL="Venue Code" DATAFORMAT="singleline"></VENUECODE>
    <ADDRESS1 TYPE="text" LABEL="Address 1" DATAFORMAT="singleline"></ADDRESS1>
    <ADDRESS2 TYPE="text" LABEL="Address 2" DATAFORMAT="singleline"></ADDRESS2>
    <CITY TYPE="text" LABEL="City" DATAFORMAT="singleline"></CITY>
    <STATE TYPE="text" LABEL="State" DATAFORMAT="simplestate"></STATE>
    <POSTALCODE TYPE="text" LABEL="Postal Code (Zip)" DATAFORMAT="singleline"></POSTALCODE>
  </GENERAL>
  <SUBVENUES TYPE="list" LABEL="Sub Venues" DATAFORMAT="menu" ROLES="BD,SB,RD,TM"></SUBVENUES>
  <BRANDING TYPE="property-catagory" LABEL="Branding Resources">
    <BRANDRECS TYPE="resourcelist" LABEL="Branding Resources"></BRANDRECS>
  </BRANDING>
  <BRANDINGHTML TYPE="userfields" LABEL="Branding HTML" DATAFORMAT="blob">
    <BRANDINGHTML_00 TYPE="html" LABEL="Branding HTML" DATAFORMAT="blob"><![CDATA[]]></BRANDINGHTML_00>
    <BRANDINGHTML_01 TYPE="html" LABEL="Discount Offer For Venue" DATAFORMAT="blob"><![CDATA[<table width="280" border="0" align="center" cellpadding="0" cellspacing="0">_carrige_return_newline  <tr>_carrige_return_newline    <td><img src="/ui-img/rule_tl.gif" width="10" height="10"></td>_carrige_return_newline    <td background="/ui-img/rule_ht.gif"><img src="/ui-img/blank.gif" width="5" height="5"></td>_carrige_return_newline    <td><img src="/ui-img/rule_tr.gif" width="10" height="10"></td>_carrige_return_newline  </tr>_carrige_return_newline  <tr>_carrige_return_newline    <td background="/ui-img/rule_vl.gif">?</td>_carrige_return_newline    <td><table cellpadding="0" cellspacing="0">_carrige_return_newline        <tr>_carrige_return_newline          <td valign="top"><font size="3"><strong>Winter Park Resort customers_carrige_return_newline                get <u><font size="4">25% Off</font></u> all orders!</strong></font>*<br>_carrige_return_newline            <img src="/ui-img/blank.gif" width="10" height="5"><br>_carrige_return_newline            <span class="small">Copy this code. You will be prompted to enter_carrige_return_newline            it during the checkout process to receive your special discount._carrige_return_newline            *Offer available for a limited time only. <br>_carrige_return_newline            <img src="/ui-img/blank.gif" width="10" height="5"><br>_carrige_return_newline            <em>special offer code:</em> </span>_carrige_return_newline            <table width="20" border="1" cellpadding="5" cellspacing="0" bgcolor="#FFFF99">_carrige_return_newline              <tr>_carrige_return_newline                <td><strong><font size="3">wpnewsite</font></strong></td>_carrige_return_newline              </tr>_carrige_return_newline            </table>_carrige_return_newline          </td>_carrige_return_newline        </tr>_carrige_return_newline      </table>_carrige_return_newline    </td>_carrige_return_newline    <td background="/ui-img/rule_vr.gif">?</td>_carrige_return_newline  </tr>_carrige_return_newline  <tr>_carrige_return_newline    <td><img src="/ui-img/rule_bl.gif" width="10" height="10"></td>_carrige_return_newline    <td background="/ui-img/rule_hb.gif"><img src="/ui-img/blank.gif" width="5" height="5"></td>_carrige_return_newline    <td><img src="/ui-img/rule_br.gif" width="10" height="10"></td>_carrige_return_newline  </tr>_carrige_return_newline</table>]]></BRANDINGHTML_01>
  </BRANDINGHTML>
  <SEARCH TYPE="property-catagory" LABEL="Search Keys" DATAFORMAT="">
    <KEYWORDS TYPE="keywords" DATAFORMAT="multiline" LABEL="Automatic Keywords" SEARCH="false">,TestVenue,</KEYWORDS>
    <SEARCHKEYPAIRS TYPE="keywords" DATAFORMAT="multiline" LABEL="Automatic Property-pairs" SEARCH="false"></SEARCHKEYPAIRS>
    <USERKEYWORDS TYPE="text" DATAFORMAT="multiline" LABEL="Addtional Keywords" SEARCH="false"></USERKEYWORDS>
    <DESCRIPTION TYPE="text" LABEL="description" DATAFORMAT="multiline"></DESCRIPTION>
  </SEARCH>
  <ROLES TYPE="rolelist" SEARCH="false">
    <ADMIN TYPE="role" LABEL="administrator">654321</ADMIN>
    <CONTENT_MANAGER TYPE="role" LABEL="Content Manager">654321</CONTENT_MANAGER>
  </ROLES>
</ARRAY_CLASS>


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-09-25 12:17 UTC] tony2001@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.


 [2006-10-03 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2006-10-03 17:08 UTC] pd at xobase dot com
You can use the code fragments below.  If I open the xml document included, and use the

xml_parse_into_struct($p,$data,&$structure,&$index);

function, I get a broken object.  The xml is well formed, but something in embedded html is breaking the parser.  Also I've noticed that if I use the file($filepath) on this, or getfilecontents($filepath), I see that the resulting file read array is broken as well.  So there may be a bug in the file() function or its sub-system.
 [2006-10-03 17:37 UTC] tony2001@php.net
Please provide short and complete reproduce code.
Thank you.
 [2006-10-11 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Dec 27 03:01:28 2024 UTC