php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49660 libxml 2.7.3+ limits text nodes to 10MB
Submitted: 2009-09-24 15:37 UTC Modified: 2011-12-02 12:30 UTC
Votes:8
Avg. Score:4.6 ± 0.5
Reproduced:8 of 8 (100.0%)
Same Version:2 (25.0%)
Same OS:1 (12.5%)
From: sta at netimage dot dk Assigned: gooh (profile)
Status: Closed Package: SOAP related
PHP Version: * OS: FreeBSD 7.1
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: sta at netimage dot dk
New email:
PHP Version: OS:

 

 [2009-09-24 15:37 UTC] sta at netimage dot dk
Description:
------------
Since version 2.7.3 libxml limits the maximum size of a single text node to 10MB.
The limit can be removed with a new option, XML_PARSE_HUGE.
PHP has no way to specify this option to libxml.

I found the bug when making af SOAP-request where the reply contained a 20MB string.
SoapClient->__call() threw an exception: 'looks like we got no XML document'

Using libxml_use_internal_errors(true) and libxml_get_errors() I could narrow it down to a LibXMLError, code 5, 'Extra content at the end of the document' - but the specified line and column was in the middle of a large text node.

Using SoapClient->__getLastResponse() I saved the response to a file.
The xmllib program xmllint then revealed the cause:
> xmllint --noout soap_response.txt 
soap_response.txt:111834: error: xmlSAX2Characters: huge text node: out of memory

We need a way to specify the XML_PARSE_HUGE option to libxml - perhaps something like a new function: libxml_parse_huge(true).


Reproduce code:
---------------
<?php
$xml = "<?xml version='1.0' encoding='utf-8' standalone='yes' ?><test>" . str_repeat('A', 12000000) . "</test>";
file_put_contents('file.xml', $xml);
$sxe = simplexml_load_file('file.xml');
if ($sxe instanceof SimpleXMLElement) {
	echo 'OK\n';
}
else {
	var_dump($sxe);
}


Expected result:
----------------
OK

Actual result:
--------------
PHP Warning:  simplexml_load_file(): file.xml:1: error: xmlSAX2Characters: huge text node: out of memory in /usr/dana/data/developers/holst/mobilmap/cron/xml.php on line 5
PHP Warning:  simplexml_load_file(): AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA in /usr/dana/data/developers/holst/mobilmap/cron/xml.php on line 5
PHP Warning:  simplexml_load_file():                                                                                ^ in /usr/dana/data/developers/holst/mobilmap/cron/xml.php on line 5
bool(false)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-09-26 12:12 UTC] kalle@php.net
I guess we could expose the constant value from ext/libxml if available like:
Index: libxml.c
===================================================================
--- libxml.c	(revision 288659)
+++ libxml.c	(working copy)
@@ -622,6 +622,9 @@
 	REGISTER_LONG_CONSTANT("LIBXML_COMPACT",	XML_PARSE_COMPACT,		CONST_CS | CONST_PERSISTENT);
 	REGISTER_LONG_CONSTANT("LIBXML_NOXMLDECL",	XML_SAVE_NO_DECL,		CONST_CS | CONST_PERSISTENT);
 #endif
+#if LIBXML_VERSION >= 20703
+	REGISTER_LONG_CONSTANT("LIBXML_PARSEHUGE",	XML_PARSE_HUGE,			CONST_CS | CONST_PERSISTENT);
+#endif
 	REGISTER_LONG_CONSTANT("LIBXML_NOEMPTYTAG",	LIBXML_SAVE_NOEMPTYTAG,	CONST_CS | CONST_PERSISTENT);
 
 	/* Error levels */


Does this work for you when passing it to SimpleXML's $option parameter?

(Patch made against PHP_5_3, but is just a 3 line c/p to other branches)
 [2009-09-28 11:31 UTC] sta at netimage dot dk
Hi Kalle.

Thanks for replying so soon.

Your patch does fix the issue when loading directly with 
simplexml_load_file('file.xml', 'SimpleXMLElement', LIBXML_PARSEHUGE);

However the error was not originally encountered using simplexml, but using SoapClient - and there does not seem to be a way to pass libxml-options to SoapClient.

So as far as I can tell, the problem remains when using SoapClient.

/Thing
 [2009-12-01 02:05 UTC] svn@php.net
Automatic comment from SVN on behalf of felipe
Revision: http://svn.php.net/viewvc/?view=revision&revision=291533
Log: - Fixed bug #49660 (libxml 2.7.3+ limits text nodes to 10MB). (Felipe)
- Added LIBXML_PARSEHUGE constant to overrides the maximum text size of a
  single text node when using libxml2.7.3+. (Kalle)
[DOC]
 [2009-12-01 02:06 UTC] felipe@php.net
This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 [2009-12-01 11:15 UTC] felipe@php.net
"Added LIBXML_PARSEHUGE constant to overrides the maximum text size of a single text node when using libxml2.7.3+."
 [2011-12-02 12:30 UTC] gooh@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: gooh
 [2016-01-11 16:36 UTC] pretandor at gmx dot de
For everyone who came across this bug and is not using SimpleXML directly but SoapClient, you can switch to Zend Soap Component:

(standalone)
https://github.com/zendframework/zend-soap
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 11:01:29 2024 UTC