php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #52832 unserialize() performance
Submitted: 2010-09-14 02:46 UTC Modified: 2010-09-20 02:04 UTC
From: galaxy dot mipt at gmail dot com Assigned: kalle (profile)
Status: Closed Package: Performance problem
PHP Version: 5.3.3 OS: Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: galaxy dot mipt at gmail dot com
New email:
PHP Version: OS:

 

 [2010-09-14 02:46 UTC] galaxy dot mipt at gmail dot com
Description:
------------
Performance of built-in unserializer degrades at unexpectedly high rate with the increase of unserialized data size (rather, with number of serialized items). Say, unserializing a plain array of ~1000000 integers might take somewhat 10 secs on average P4 machine, and the worst part is that the time raises quadratically (O(n^2)) with the array size, i.e. ~2000000-ish array would take 40 secs or so.

The main performance killer is var_hash linked list where every extracted variable is pushed. It is looked up sequentally from the very beginning up to, in fact, the very end during every push operation (var_push() in ext/standard/var_unserializer.c). It appears that looking from the end (or just storing last used element elsewhere) would save a lot of cycles.

In my tests doing so reduced the unserialize time from 7 secs to ~0.3 sec on 1000000-size array and size dependency apparently changed to something more like O(n*log(n))


Patches

php-5.3-unserialize-performance.patch (last revision 2010-09-14 16:32 UTC by galaxy dot mipt at gmail dot com)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-09-14 02:58 UTC] cataphract@php.net
> In my tests doing so reduced the unserialize time from 7 secs to ~0.3 sec on 1000000-size array and size dependency apparently changed to something more like O(n*log(n))

Could you submit a patch with that modification and a test script that exemplifies the speedup?
 [2010-09-14 18:36 UTC] galaxy dot mipt at gmail dot com
Added a patch against latest SVN version, did things in a way that required least code modification.

Here goes the test script:

<?php
ini_set('memory_limit', '512M');

$sizes = array(100000, 200000, 500000, 1000000);

foreach($sizes as $N) {

    $data = array();
    for($i=0; $i < $N; $i++) $data[] = mt_rand();

    $timeSerialize = 0;
    $timeUnserialize = 0;

    for($run=0; $run < 10; $run++) {

        $ts = microtime(1);
        $ser = serialize($data);
        $timeSerialize +=  microtime(1) - $ts;

        $ts =  microtime(1);
        $unser = unserialize($ser);
        $timeUnserialize +=  microtime(1) - $ts;

        if (count($data) != count($unser)) print "Error: array sizes mismatch\n";
        for($i=0; $i < $N; $i++)
            if (!isset($unser[$i]) || $data[$i] != $unser[$i])
                print "Error: array elements mismatch\n";

        unset($ser);
        unset($unser);
    }

    print "Size: $N\t\tSerialize: " . (floor(1000*$timeSerialize)) . "ms\t\tUnserialize: " . (floor(1000*$timeUnserialize)) . "ms\n\n";

}
?>


It's a bit memory consuming, so array sizes might need to be reduced depending on available hardware.

My test results:

Original PHP:
Size: 100000            Serialize: 483ms                Unserialize: 470ms

Size: 200000            Serialize: 1047ms               Unserialize: 1308ms

Size: 500000            Serialize: 2638ms               Unserialize: 14360ms

Size: 1000000           Serialize: 6319ms               Unserialize: 72744ms

Patched PHP:
Size: 100000            Serialize: 500ms                Unserialize: 357ms

Size: 200000            Serialize: 870ms                Unserialize: 703ms

Size: 500000            Serialize: 2212ms               Unserialize: 1315ms

Size: 1000000           Serialize: 4898ms               Unserialize: 2823ms
 [2010-09-15 04:46 UTC] kalle@php.net
-Status: Open +Status: Assigned -Assigned To: +Assigned To: kalle
 [2010-09-15 04:46 UTC] kalle@php.net
Hi we cannot merge this into 5.3, as it changes a structure (php_unserialize_data) thats exported to extensions in a type, breaking the ABI. But without a doubt it should go in trunk atleast.
 [2010-09-18 18:09 UTC] kalle@php.net
Implemented in trunk, thanks for your work.
 [2010-09-20 02:04 UTC] kalle@php.net
-Status: Assigned +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 23:01:29 2024 UTC