PHP Bugs  
php.net | support | documentation | report a bug | advanced search | search howto | statistics | login

go to bug id or search bugs for  

Bug #39736 serialize() consumes insane amount of RAM
Submitted:4 Dec 2006 9:24pm UTC Modified: 6 Dec 2006 11:09pm UTC
From:cnww at hfx dot andara dot com Assigned to:
Status:Closed Category:Arrays related
Version:5.2.0 OS:Debian GNU/Linux
Votes:1 Avg. Score:4.0 ± 0.0 Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%) Same OS:1 (100.0%)
View/Vote Developer Edit Submission

[4 Dec 2006 9:24pm UTC] cnww at hfx dot andara dot com
Description:
------------
When called on a tree implemented as a multi-dimensional array occupying
something like 60MB of RAM serialize fails with an error like:

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to
allocate 6282420 bytes) in /var/www/lib/metacache.php on line 105

So either the memory allocation system is busted or the serialization of
~60MB of data (actually the full text version is only 9.1MB) requires
over 1GB of memory.

Reproduce code:
---------------
The line that is actually failing is:

$meta_ser = serialize( &$metacache_buf );

For obvious reasons I have not posted the whole array here. 

I created an example program that reproduces the problem using an array
of 242,919 strings made with var_export() from the directory listing of
a Windows machine's D: drive.

You can get the code from:
http://arachne.k-pcltee.com/~cnww/serialbug.bz2

The example program seems to require around 52MB of RAM to create the
array, so if you're using the default 8MB limit it will fail before it
gets to the serialize() call. Any limit of 64MB or greater seems to
result in it failing during serialize().

Expected result:
----------------
It should print "Exiting normally"

Actual result:
--------------
Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to
allocate 6010844 bytes) in /tmp/serialbug.php on line 313030
[4 Dec 2006 9:26pm UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

serialize() representation of the data is quite a bit larger 
then it's representation in memory. So a 60mb array could 
easily take 120-180 megs as a serialized string.
[6 Dec 2006 12:17am UTC] cnww at hfx dot andara dot com
Yes, obviously the serialized string can be larger than the array in
memory. The issue here is that when you get a sufficiently complex array
the memory required to serialize suddenly balloons out to SEVERAL
HUNDRED TIMES the size of the array.

Just for the heck of it I created some more swap space and tried this
again with a memory limit of 4GB (actually 4095MB, apparently PHP cannot
handle 4096), this allowed the serialize() call to succeed. It
eventually produced a 6.8MB output string (7,147,979 characters to be
exact), and the PHP process' peak memory usage while creating this 6.8MB
string was 1,540MB of RAM.

So what is it doing with a ~7MB of data that requires over 1.5GB of
RAM?

I may be missing something, but at the moment I do not see any reason
why the memory requirement for serialize() ought to be anything more
than O(n) on the size of the input. I had a look at the source in var.c,
but nothing jumps out at me as an obvious problem. My best guess based
on the symptoms is that it's somehow copying the buffer and/or hash
table when recursing through an array of arrays.

If this really isn't a bug then I apologize for wasting your time. I'll
polish my own serialize() function a bit and post it as a comment in the
function reference for other people running into this problem.
[6 Dec 2006 1:05am UTC] cnww at hfx dot andara dot com
For the benefit of future viewers, the following function serializes the
test array in about 1.4 seconds on my machine (vs. about 2 minutes for
PHP's serialize(), but I expect that's mainly due to my function not
needing swap space) and it does not appear to significantly affect the
size of the PHP process in RAM (i.e. it serializes with "memory_limit =
64MB" on the same array that serialize() cannot handle with
"memory_limit = 1024MB"). The output is identical (same MD5 checksum) as
the output from serialize() on the same test array (I have not yet
proven that the output is always the same as serialize()).

function my_serialize( &$data, $buf = "" )
{
	if( is_array( &$data ))
	{
		$buf .= "a:" . count( &$data ) . ":{";
		foreach( $data as $key => &$value )
		{
			$buf .= serialize( $key ) . my_serialize( &$value );
		}
		$buf .= "}";
		return $buf;
	}
	else
	{
		return $buf . serialize( &$data );
	}
}

The only serious problem I see with it is that if it's called on a
non-array object that contains a complex array as a property then the
array member will end up being handled by serialize().
[6 Dec 2006 11:56am UTC] tony2001@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5.2-win32-latest.zip

[6 Dec 2006 11:09pm UTC] cnww at hfx dot andara dot com
Yes, using the same test array with the CVS snapshot
(php5.2-200612061930) serialize() succeeds within a memory limit of
64MB, and takes an average of 0.44 seconds (so it's also about 3x faster
than  than my_serialize()). I'd say that qualifies as a successful fix.

RSS feed | show source 

PHP Copyright © 2001-2009 The PHP Group
All rights reserved.
Last updated: Sat Nov 21 10:30:49 2009 UTC