| Bug #39736 | serialize() consumes insane amount of RAM | ||||
|---|---|---|---|---|---|
| Submitted: | 4 Dec 2006 9:24pm UTC | Modified: | 6 Dec 2006 11:09pm UTC | ||
| From: | cnww at hfx dot andara dot com | Assigned to: | |||
| Status: | Closed | Category: | Arrays related | ||
| Version: | 5.2.0 | OS: | Debian GNU/Linux | ||
| Votes: | 1 | Avg. Score: | 4.0 ± 0.0 | Reproduced: | 1 of 1 (100.0%) |
| Same Version: | 1 (100.0%) | Same OS: | 1 (100.0%) | ||
[4 Dec 2006 9:24pm UTC] cnww at hfx dot andara dot com
[4 Dec 2006 9:26pm UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php serialize() representation of the data is quite a bit larger then it's representation in memory. So a 60mb array could easily take 120-180 megs as a serialized string.
[6 Dec 2006 12:17am UTC] cnww at hfx dot andara dot com
Yes, obviously the serialized string can be larger than the array in memory. The issue here is that when you get a sufficiently complex array the memory required to serialize suddenly balloons out to SEVERAL HUNDRED TIMES the size of the array. Just for the heck of it I created some more swap space and tried this again with a memory limit of 4GB (actually 4095MB, apparently PHP cannot handle 4096), this allowed the serialize() call to succeed. It eventually produced a 6.8MB output string (7,147,979 characters to be exact), and the PHP process' peak memory usage while creating this 6.8MB string was 1,540MB of RAM. So what is it doing with a ~7MB of data that requires over 1.5GB of RAM? I may be missing something, but at the moment I do not see any reason why the memory requirement for serialize() ought to be anything more than O(n) on the size of the input. I had a look at the source in var.c, but nothing jumps out at me as an obvious problem. My best guess based on the symptoms is that it's somehow copying the buffer and/or hash table when recursing through an array of arrays. If this really isn't a bug then I apologize for wasting your time. I'll polish my own serialize() function a bit and post it as a comment in the function reference for other people running into this problem.
[6 Dec 2006 1:05am UTC] cnww at hfx dot andara dot com
For the benefit of future viewers, the following function serializes the
test array in about 1.4 seconds on my machine (vs. about 2 minutes for
PHP's serialize(), but I expect that's mainly due to my function not
needing swap space) and it does not appear to significantly affect the
size of the PHP process in RAM (i.e. it serializes with "memory_limit =
64MB" on the same array that serialize() cannot handle with
"memory_limit = 1024MB"). The output is identical (same MD5 checksum) as
the output from serialize() on the same test array (I have not yet
proven that the output is always the same as serialize()).
function my_serialize( &$data, $buf = "" )
{
if( is_array( &$data ))
{
$buf .= "a:" . count( &$data ) . ":{";
foreach( $data as $key => &$value )
{
$buf .= serialize( $key ) . my_serialize( &$value );
}
$buf .= "}";
return $buf;
}
else
{
return $buf . serialize( &$data );
}
}
The only serious problem I see with it is that if it's called on a
non-array object that contains a complex array as a property then the
array member will end up being handled by serialize().
[6 Dec 2006 11:56am UTC] tony2001@php.net
Please try using this CVS snapshot: http://snaps.php.net/php5.2-latest.tar.gz For Windows: http://snaps.php.net/win32/php5.2-win32-latest.zip
[6 Dec 2006 11:09pm UTC] cnww at hfx dot andara dot com
Yes, using the same test array with the CVS snapshot (php5.2-200612061930) serialize() succeeds within a memory limit of 64MB, and takes an average of 0.44 seconds (so it's also about 3x faster than than my_serialize()). I'd say that qualifies as a successful fix.
