php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #11321 memory corruption when parsing large XML files
Submitted: 2001-06-06 16:18 UTC Modified: 2002-02-27 00:00 UTC
Votes:6
Avg. Score:4.8 ± 0.4
Reproduced:5 of 5 (100.0%)
Same Version:0 (0.0%)
Same OS:2 (40.0%)
From: olivier at lx dot student dot wau dot nl Assigned:
Status: No Feedback Package: XML related
PHP Version: 4.0.5 OS: Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: olivier at lx dot student dot wau dot nl
New email:
PHP Version: OS:

 

 [2001-06-06 16:18 UTC] olivier at lx dot student dot wau dot nl
When parsing large XML files the PHP module starts corrupting memory. In large applications this may result in a segfault, in this example application it results in corruption of the xml parser itself (at least on my test machines). 

The code below is tested on three different servers, 2 of them running Debian/Stable (Potato) with a custom compiled PHP 4.0.5, one server running Debian/Unstable (Sid) with the Debian default PHP module (also 4.0.5). All run PHP as Apache module.

The custom build modules are build like this:
./configure --with-apxs=/usr/bin/apxs --with-mysql=/usr --with-config-file-path=/etc/php4/apache --with-interbase=shared --with-gnu-ld --with-xml --with-gd=shared

When using a small XML file, the code below runs fine, but when the XML file gets bigger PHP starts complaining Unable to call handler end_tag_handler() , Unable to call handler data_handler() or Unable to call handler start_tag_handler().

The code:
<?php

class properties {
	var $parser;
}

class my_class {
	var $mydata;
	var $mystarttag;
	var $myendtag;

	function my_class(&$props) {
		xml_set_object($props->parser,&$this);
		xml_set_element_handler($props->parser, 'start_tag_handler', 'end_tag_handler');
		xml_set_character_data_handler($props->parser, 'data_handler');
	}
	
	function data_handler($parser, $data) {
		$this->mydata .= $data;
	}
	function start_tag_handler($parser, $name, $attrs) {
		$this->mystarttag .= $name;
	}
	function end_tag_handler($parser, $name) {
		$this->myendtag .= $name;
		echo 'end tag is '.$name.'<br>';
	}
}

function first_element_handler($parser, $name, $attrs) {
	global $props;
	if ($name == 'QUESTION') {
		/* normally I decide here, depending on $attrs what 
		kind of object I should create. For debugging purposes
		I use just one object. */
		$q_obj = new my_class($props);
	
	} else {
		die('This XML file has an unknown content');
	}
}

function start_parsing($file) {
	global $props;
	
	$props = new properties();
	$props->parser = xml_parser_create();
	xml_parser_set_option($props->parser, XML_OPTION_CASE_FOLDING, TRUE);
	xml_set_element_handler($props->parser, 'first_element_handler', '');
	if (!($fp = fopen($file, "r"))) {
		die("could not open XML input");
	}

	while ($data = fread($fp, 4096)) {
		if (!xml_parse($props->parser, $data, feof($fp))) {
			die(sprintf("XML error: %s at line %d",
					xml_error_string(xml_get_error_code($props->parser)),
					xml_get_current_line_number($props->parser)));
		}
	}
	xml_parser_free($props->parser);
}
echo 'start parsing';
start_parsing('val1.xml');
echo 'finished parsing';
?>

The XML file looks like this, but to see the problems the part between <question> and </question> should be four times bigger:

<question type="VALUE">
<title>Fermentor size</title>
<hint>4032 hours a year, makes 96 runs a year. How many kg per cubic meter per year do you produce?</hint>
<hint>That is 10 X 96 = 960 kg per cubic meter per year</hint>
<hint>You need 10.000 kg, so you need 10.4 cubic meter production</hint>
<hint>You can't fill the reactor 100%, so let's assume 12 cubic meter will do</hint>
<range correct="1">
<minval>11</minval>
<maxval>15</maxval>
<feedback>Very good! &lt;b&gt;Assuming&lt;/b&gt; you can fill a reactor....</feedback>
</range>
<range correct="0">
<minval>1</minval>
<maxval>11</maxval>
<feedback>Sorry, try again</feedback>
</range>
</question>

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-01-14 09:59 UTC] lobbin@php.net
Is this reproducable on 4.1.1?
 [2002-02-27 00:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2006-10-12 14:33 UTC] jnicoll at seemysites dot net
I have also experienced this problem (or a similar problem) with PHP4's native XML DOM techniques.  Also happens on DOM n PHP5. I am running PHP4 on my web server (Linux, don't know which version) and on my test machine (Mac OS X 10.4.8)  Every process I try chokes on Clickbank's 6MB XML file and only outputs garbage data. PHP5's SimpleXML seems to work well.
 [2008-08-29 15:08 UTC] jpyeron at pdinc dot us
having this issue on:
PHP Version 4.3.2
System  Windows NT xxxxxxxxxxxxxxx 5.2 build 3790  
Build Date  May 28 2003 15:06:05  
Server API  Apache  
Virtual Directory Support  enabled  
Configuration File (php.ini) Path  C:\WINDOWS\php.ini  
PHP API  20020918  
PHP Extension  20020429  
Zend Extension  20021010  
Debug Build  no  
Thread Safety  enabled  
Registered PHP Streams  php, http, ftp, compress.zlib 


and on:
PHP Version 4.3.9 

System  Linux xxxxxxxxxxxxxxxxxxx 2.6.9-55.ELsmp #1 SMP Wed May 2 14:28:44 EDT 2007 i686  
Build Date  Sep 20 2007 19:37:02  
Configure Command  './configure' '--build=i686-redhat-linux-gnu' '--host=i686-redhat-linux-gnu' '--target=i386-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib' '--libexecdir=/usr/libexec' '--localstatedir=/var' '--sharedstatedir=/usr/com' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--cache-file=../config.cache' '--with-config-file-path=/etc' '--with-config-file-scan-dir=/etc/php.d' '--enable-force-cgi-redirect' '--disable-debug' '--enable-pic' '--disable-rpath' '--enable-inline-optimization' '--with-bz2' '--with-db4=/usr' '--with-curl' '--with-exec-dir=/usr/bin' '--with-freetype-dir=/usr' '--with-png-dir=/usr' '--with-gd=shared' '--enable-gd-native-ttf' '--without-gdbm' '--with-gettext' '--with-ncurses=shared' '--with-gmp' '--with-iconv' '--with-jpeg-dir=/usr' '--with-openssl' '--with-png' '--with-pspell' '--with-xml' '--with-expat-dir=/usr' '--with-dom=shared,/usr' '--with-dom-xslt=/usr' '--with-dom-exslt=/usr' '--with-xmlrpc=shared' '--with-pcre-regex=/usr' '--with-zlib' '--with-layout=GNU' '--enable-bcmath' '--enable-exif' '--enable-ftp' '--enable-magic-quotes' '--enable-sockets' '--enable-sysvsem' '--enable-sysvshm' '--enable-track-vars' '--enable-trans-sid' '--enable-yp' '--enable-wddx' '--with-pear=/usr/share/pear' '--with-imap=shared' '--with-imap-ssl' '--with-kerberos' '--with-ldap=shared' '--with-mysql=shared,/usr' '--with-pgsql=shared' '--with-snmp=shared,/usr' '--with-snmp=shared' '--enable-ucd-snmp-hack' '--with-unixODBC=shared,/usr' '--enable-memory-limit' '--enable-shmop' '--enable-calendar' '--enable-dbx' '--enable-dio' '--enable-mbstring=shared' '--enable-mbstr-enc-trans' '--enable-mbregex' '--with-mime-magic=/usr/share/file/magic.mime' '--with-apxs2=/usr/sbin/apxs'  
Server API  Apache 2.0 Handler  
Virtual Directory Support  disabled  
Configuration File (php.ini) Path  /etc/php.ini  
Scan this dir for additional .ini files  /etc/php.d  
additional .ini files parsed  /etc/php.d/oci8.ini  
PHP API  20020918  
PHP Extension  20020429  
Zend Extension  20021010  
Debug Build  no  
Thread Safety  disabled  
Registered PHP Streams  php, http, ftp, https, ftps, compress.bzip2, compress.zlib
 [2008-08-29 16:06 UTC] jpyeron at pdinc dot us
sorry, false alarm, this exact problem has not manifested for us.
We had a buffering issue / memory allocation issue.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 02 15:01:30 2024 UTC