php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #39736 serialize() consumes insane amount of RAM
Submitted: 2006-12-04 21:24 UTC Modified: 2006-12-06 23:09 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: cnww at hfx dot andara dot com Assigned:
Status: Closed Package: Arrays related
PHP Version: 5.2.0 OS: Debian GNU/Linux
Private report: No CVE-ID: None
 [2006-12-04 21:24 UTC] cnww at hfx dot andara dot com
Description:
------------
When called on a tree implemented as a multi-dimensional array occupying something like 60MB of RAM serialize fails with an error like:

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 6282420 bytes) in /var/www/lib/metacache.php on line 105

So either the memory allocation system is busted or the serialization of ~60MB of data (actually the full text version is only 9.1MB) requires over 1GB of memory.

Reproduce code:
---------------
The line that is actually failing is:

$meta_ser = serialize( &$metacache_buf );

For obvious reasons I have not posted the whole array here. 

I created an example program that reproduces the problem using an array of 242,919 strings made with var_export() from the directory listing of a Windows machine's D: drive.

You can get the code from: http://arachne.k-pcltee.com/~cnww/serialbug.bz2

The example program seems to require around 52MB of RAM to create the array, so if you're using the default 8MB limit it will fail before it gets to the serialize() call. Any limit of 64MB or greater seems to result in it failing during serialize().

Expected result:
----------------
It should print "Exiting normally"

Actual result:
--------------
Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 6010844 bytes) in /tmp/serialbug.php on line 313030

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-12-04 21:26 UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

serialize() representation of the data is quite a bit larger 
then it's representation in memory. So a 60mb array could 
easily take 120-180 megs as a serialized string.
 [2006-12-06 00:17 UTC] cnww at hfx dot andara dot com
Yes, obviously the serialized string can be larger than the array in memory. The issue here is that when you get a sufficiently complex array the memory required to serialize suddenly balloons out to SEVERAL HUNDRED TIMES the size of the array.

Just for the heck of it I created some more swap space and tried this again with a memory limit of 4GB (actually 4095MB, apparently PHP cannot handle 4096), this allowed the serialize() call to succeed. It eventually produced a 6.8MB output string (7,147,979 characters to be exact), and the PHP process' peak memory usage while creating this 6.8MB string was 1,540MB of RAM.

So what is it doing with a ~7MB of data that requires over 1.5GB of RAM?

I may be missing something, but at the moment I do not see any reason why the memory requirement for serialize() ought to be anything more than O(n) on the size of the input. I had a look at the source in var.c, but nothing jumps out at me as an obvious problem. My best guess based on the symptoms is that it's somehow copying the buffer and/or hash table when recursing through an array of arrays.

If this really isn't a bug then I apologize for wasting your time. I'll polish my own serialize() function a bit and post it as a comment in the function reference for other people running into this problem.
 [2006-12-06 01:05 UTC] cnww at hfx dot andara dot com
For the benefit of future viewers, the following function serializes the test array in about 1.4 seconds on my machine (vs. about 2 minutes for PHP's serialize(), but I expect that's mainly due to my function not needing swap space) and it does not appear to significantly affect the size of the PHP process in RAM (i.e. it serializes with "memory_limit = 64MB" on the same array that serialize() cannot handle with "memory_limit = 1024MB"). The output is identical (same MD5 checksum) as the output from serialize() on the same test array (I have not yet proven that the output is always the same as serialize()).

function my_serialize( &$data, $buf = "" )
{
	if( is_array( &$data ))
	{
		$buf .= "a:" . count( &$data ) . ":{";
		foreach( $data as $key => &$value )
		{
			$buf .= serialize( $key ) . my_serialize( &$value );
		}
		$buf .= "}";
		return $buf;
	}
	else
	{
		return $buf . serialize( &$data );
	}
}

The only serious problem I see with it is that if it's called on a non-array object that contains a complex array as a property then the array member will end up being handled by serialize().
 [2006-12-06 11:56 UTC] tony2001@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5.2-win32-latest.zip


 [2006-12-06 23:09 UTC] cnww at hfx dot andara dot com
Yes, using the same test array with the CVS snapshot (php5.2-200612061930) serialize() succeeds within a memory limit of 64MB, and takes an average of 0.44 seconds (so it's also about 3x faster than  than my_serialize()). I'd say that qualifies as a successful fix.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 30 14:01:28 2024 UTC