php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49660 libxml 2.7.3+ limits text nodes to 10MB
Submitted: 2009-09-24 15:37 UTC Modified: 2011-12-02 12:30 UTC
Votes:8
Avg. Score:4.6 ± 0.5
Reproduced:8 of 8 (100.0%)
Same Version:2 (25.0%)
Same OS:1 (12.5%)
From: sta at netimage dot dk Assigned: gooh (profile)
Status: Closed Package: SOAP related
PHP Version: * OS: FreeBSD 7.1
Private report: No CVE-ID: None
 [2009-09-24 15:37 UTC] sta at netimage dot dk
Description:
------------
Since version 2.7.3 libxml limits the maximum size of a single text node to 10MB.
The limit can be removed with a new option, XML_PARSE_HUGE.
PHP has no way to specify this option to libxml.

I found the bug when making af SOAP-request where the reply contained a 20MB string.
SoapClient->__call() threw an exception: 'looks like we got no XML document'

Using libxml_use_internal_errors(true) and libxml_get_errors() I could narrow it down to a LibXMLError, code 5, 'Extra content at the end of the document' - but the specified line and column was in the middle of a large text node.

Using SoapClient->__getLastResponse() I saved the response to a file.
The xmllib program xmllint then revealed the cause:
> xmllint --noout soap_response.txt 
soap_response.txt:111834: error: xmlSAX2Characters: huge text node: out of memory

We need a way to specify the XML_PARSE_HUGE option to libxml - perhaps something like a new function: libxml_parse_huge(true).


Reproduce code:
---------------
<?php
$xml = "<?xml version='1.0' encoding='utf-8' standalone='yes' ?><test>" . str_repeat('A', 12000000) . "</test>";
file_put_contents('file.xml', $xml);
$sxe = simplexml_load_file('file.xml');
if ($sxe instanceof SimpleXMLElement) {
	echo 'OK\n';
}
else {
	var_dump($sxe);
}


Expected result:
----------------
OK

Actual result:
--------------
PHP Warning:  simplexml_load_file(): file.xml:1: error: xmlSAX2Characters: huge text node: out of memory in /usr/dana/data/developers/holst/mobilmap/cron/xml.php on line 5
PHP Warning:  simplexml_load_file(): AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA in /usr/dana/data/developers/holst/mobilmap/cron/xml.php on line 5
PHP Warning:  simplexml_load_file():                                                                                ^ in /usr/dana/data/developers/holst/mobilmap/cron/xml.php on line 5
bool(false)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-09-26 12:12 UTC] kalle@php.net
I guess we could expose the constant value from ext/libxml if available like:
Index: libxml.c
===================================================================
--- libxml.c	(revision 288659)
+++ libxml.c	(working copy)
@@ -622,6 +622,9 @@
 	REGISTER_LONG_CONSTANT("LIBXML_COMPACT",	XML_PARSE_COMPACT,		CONST_CS | CONST_PERSISTENT);
 	REGISTER_LONG_CONSTANT("LIBXML_NOXMLDECL",	XML_SAVE_NO_DECL,		CONST_CS | CONST_PERSISTENT);
 #endif
+#if LIBXML_VERSION >= 20703
+	REGISTER_LONG_CONSTANT("LIBXML_PARSEHUGE",	XML_PARSE_HUGE,			CONST_CS | CONST_PERSISTENT);
+#endif
 	REGISTER_LONG_CONSTANT("LIBXML_NOEMPTYTAG",	LIBXML_SAVE_NOEMPTYTAG,	CONST_CS | CONST_PERSISTENT);
 
 	/* Error levels */


Does this work for you when passing it to SimpleXML's $option parameter?

(Patch made against PHP_5_3, but is just a 3 line c/p to other branches)
 [2009-09-28 11:31 UTC] sta at netimage dot dk
Hi Kalle.

Thanks for replying so soon.

Your patch does fix the issue when loading directly with 
simplexml_load_file('file.xml', 'SimpleXMLElement', LIBXML_PARSEHUGE);

However the error was not originally encountered using simplexml, but using SoapClient - and there does not seem to be a way to pass libxml-options to SoapClient.

So as far as I can tell, the problem remains when using SoapClient.

/Thing
 [2009-12-01 02:05 UTC] svn@php.net
Automatic comment from SVN on behalf of felipe
Revision: http://svn.php.net/viewvc/?view=revision&revision=291533
Log: - Fixed bug #49660 (libxml 2.7.3+ limits text nodes to 10MB). (Felipe)
- Added LIBXML_PARSEHUGE constant to overrides the maximum text size of a
  single text node when using libxml2.7.3+. (Kalle)
[DOC]
 [2009-12-01 02:06 UTC] felipe@php.net
This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 [2009-12-01 11:15 UTC] felipe@php.net
"Added LIBXML_PARSEHUGE constant to overrides the maximum text size of a single text node when using libxml2.7.3+."
 [2011-12-02 12:30 UTC] gooh@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: gooh
 [2016-01-11 16:36 UTC] pretandor at gmx dot de
For everyone who came across this bug and is not using SimpleXML directly but SoapClient, you can switch to Zend Soap Component:

(standalone)
https://github.com/zendframework/zend-soap
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 14 00:01:26 2024 UTC