php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #11321 memory corruption when parsing large XML files
Submitted: 2001-06-06 16:18 UTC Modified: 2002-02-27 00:00 UTC
Votes:6
Avg. Score:4.8 ± 0.4
Reproduced:5 of 5 (100.0%)
Same Version:0 (0.0%)
Same OS:2 (40.0%)
From: olivier at lx dot student dot wau dot nl Assigned:
Status: No Feedback Package: XML related
PHP Version: 4.0.5 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: olivier at lx dot student dot wau dot nl
New email:
PHP Version: OS:

 

 [2001-06-06 16:18 UTC] olivier at lx dot student dot wau dot nl
When parsing large XML files the PHP module starts corrupting memory. In large applications this may result in a segfault, in this example application it results in corruption of the xml parser itself (at least on my test machines). 

The code below is tested on three different servers, 2 of them running Debian/Stable (Potato) with a custom compiled PHP 4.0.5, one server running Debian/Unstable (Sid) with the Debian default PHP module (also 4.0.5). All run PHP as Apache module.

The custom build modules are build like this:
./configure --with-apxs=/usr/bin/apxs --with-mysql=/usr --with-config-file-path=/etc/php4/apache --with-interbase=shared --with-gnu-ld --with-xml --with-gd=shared

When using a small XML file, the code below runs fine, but when the XML file gets bigger PHP starts complaining Unable to call handler end_tag_handler() , Unable to call handler data_handler() or Unable to call handler start_tag_handler().

The code:
<?php

class properties {
	var $parser;
}

class my_class {
	var $mydata;
	var $mystarttag;
	var $myendtag;

	function my_class(&$props) {
		xml_set_object($props->parser,&$this);
		xml_set_element_handler($props->parser, 'start_tag_handler', 'end_tag_handler');
		xml_set_character_data_handler($props->parser, 'data_handler');
	}
	
	function data_handler($parser, $data) {
		$this->mydata .= $data;
	}
	function start_tag_handler($parser, $name, $attrs) {
		$this->mystarttag .= $name;
	}
	function end_tag_handler($parser, $name) {
		$this->myendtag .= $name;
		echo 'end tag is '.$name.'<br>';
	}
}

function first_element_handler($parser, $name, $attrs) {
	global $props;
	if ($name == 'QUESTION') {
		/* normally I decide here, depending on $attrs what 
		kind of object I should create. For debugging purposes
		I use just one object. */
		$q_obj = new my_class($props);
	
	} else {
		die('This XML file has an unknown content');
	}
}

function start_parsing($file) {
	global $props;
	
	$props = new properties();
	$props->parser = xml_parser_create();
	xml_parser_set_option($props->parser, XML_OPTION_CASE_FOLDING, TRUE);
	xml_set_element_handler($props->parser, 'first_element_handler', '');
	if (!($fp = fopen($file, "r"))) {
		die("could not open XML input");
	}

	while ($data = fread($fp, 4096)) {
		if (!xml_parse($props->parser, $data, feof($fp))) {
			die(sprintf("XML error: %s at line %d",
					xml_error_string(xml_get_error_code($props->parser)),
					xml_get_current_line_number($props->parser)));
		}
	}
	xml_parser_free($props->parser);
}
echo 'start parsing';
start_parsing('val1.xml');
echo 'finished parsing';
?>

The XML file looks like this, but to see the problems the part between <question> and </question> should be four times bigger:

<question type="VALUE">
<title>Fermentor size</title>
<hint>4032 hours a year, makes 96 runs a year. How many kg per cubic meter per year do you produce?</hint>
<hint>That is 10 X 96 = 960 kg per cubic meter per year</hint>
<hint>You need 10.000 kg, so you need 10.4 cubic meter production</hint>
<hint>You can't fill the reactor 100%, so let's assume 12 cubic meter will do</hint>
<range correct="1">
<minval>11</minval>
<maxval>15</maxval>
<feedback>Very good! &lt;b&gt;Assuming&lt;/b&gt; you can fill a reactor....</feedback>
</range>
<range correct="0">
<minval>1</minval>
<maxval>11</maxval>
<feedback>Sorry, try again</feedback>
</range>
</question>

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-01-14 09:59 UTC] lobbin@php.net
Is this reproducable on 4.1.1?
 [2002-02-27 00:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2006-10-12 14:33 UTC] jnicoll at seemysites dot net
I have also experienced this problem (or a similar problem) with PHP4's native XML DOM techniques.  Also happens on DOM n PHP5. I am running PHP4 on my web server (Linux, don't know which version) and on my test machine (Mac OS X 10.4.8)  Every process I try chokes on Clickbank's 6MB XML file and only outputs garbage data. PHP5's SimpleXML seems to work well.
 [2008-08-29 15:08 UTC] jpyeron at pdinc dot us
having this issue on:
PHP Version 4.3.2
System  Windows NT xxxxxxxxxxxxxxx 5.2 build 3790  
Build Date  May 28 2003 15:06:05  
Server API  Apache  
Virtual Directory Support  enabled  
Configuration File (php.ini) Path  C:\WINDOWS\php.ini  
PHP API  20020918  
PHP Extension  20020429  
Zend Extension  20021010  
Debug Build  no  
Thread Safety  enabled  
Registered PHP Streams  php, http, ftp, compress.zlib 


and on:
PHP Version 4.3.9 

System  Linux xxxxxxxxxxxxxxxxxxx 2.6.9-55.ELsmp #1 SMP Wed May 2 14:28:44 EDT 2007 i686  
Build Date  Sep 20 2007 19:37:02  
Configure Command  './configure' '--build=i686-redhat-linux-gnu' '--host=i686-redhat-linux-gnu' '--target=i386-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--cache-file=../config.cache' '--with-config-file-path=/etc' '--with-config-file-scan-dir=/etc/php.d' '--enable-force-cgi-redirect' '--disable-debug' '--enable-pic' '--disable-rpath' '--enable-inline-optimization' '--with-bz2' '--with-db4=/usr' '--with-curl' '--with-exec-dir=/usr/bin' '--with-freetype-dir=/usr' '--with-png-dir=/usr' '--with-gd=shared' '--enable-gd-native-ttf' '--without-gdbm' '--with-gettext' '--with-ncurses=shared' '--with-gmp' '--with-iconv' '--with-jpeg-dir=/usr' '--with-openssl' '--with-png' '--with-pspell' '--with-xml' '--with-expat-dir=/usr' '--with-dom=shared,/usr' '--with-dom-xslt=/usr' '--with-dom-exslt=/usr' '--with-xmlrpc=shared' '--with-pcre-regex=/usr' '--with-zlib' '--with-layout=GNU' '--enable-bcmath' '--enable-exif' '--enable-ftp' '--enable-magic-quotes' '--enable-sockets' '--enable-sysvsem' '--enable-sysvshm' '--enable-track-vars' '--enable-trans-sid' '--enable-yp' '--enable-wddx' '--with-pear=/usr/share/pear' '--with-imap=shared' '--with-imap-ssl' '--with-kerberos' '--with-ldap=shared' '--with-mysql=shared,/usr' '--with-pgsql=shared' '--with-snmp=shared,/usr' '--with-snmp=shared' '--enable-ucd-snmp-hack' '--with-unixODBC=shared,/usr' '--enable-memory-limit' '--enable-shmop' '--enable-calendar' '--enable-dbx' '--enable-dio' '--enable-mbstring=shared' '--enable-mbstr-enc-trans' '--enable-mbregex' '--with-mime-magic=/usr/share/file/magic.mime' '--with-apxs2=/usr/sbin/apxs'  
Server API  Apache 2.0 Handler  
Virtual Directory Support  disabled  
Configuration File (php.ini) Path  /etc/php.ini  
Scan this dir for additional .ini files  /etc/php.d  
additional .ini files parsed  /etc/php.d/oci8.ini  
PHP API  20020918  
PHP Extension  20020429  
Zend Extension  20021010  
Debug Build  no  
Thread Safety  disabled  
Registered PHP Streams  php, http, ftp, https, ftps, compress.bzip2, compress.zlib
 [2008-08-29 16:06 UTC] jpyeron at pdinc dot us
sorry, false alarm, this exact problem has not manifested for us.
We had a buffering issue / memory allocation issue.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 18:01:31 2024 UTC