php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47643 array_diff() takes over 3000 times longer than php 5.2.4
Submitted: 2009-03-13 11:49 UTC Modified: 2010-11-01 18:18 UTC
Votes:35
Avg. Score:4.7 ± 0.6
Reproduced:28 of 29 (96.6%)
Same Version:22 (78.6%)
Same OS:17 (60.7%)
From: viper7 at viper-7 dot com Assigned: felipe (profile)
Status: Closed Package: Performance problem
PHP Version: 5.*, 6CVS (2009-04-13) OS: *
Private report: No CVE-ID: None
 [2009-03-13 11:49 UTC] viper7 at viper-7 dot com
Description:
------------
This bug was reported in ##php on freenode, and after some thorough testing on multiple machines we determined it must be an engine bug.

array_diff on two large arrays of md5 hashes (600,000 elements each) takes approximately 4 seconds on a fast server in PHP 5.2.4 and below (confirmed with PHP 5.2.0), but over 4 hours (!) on PHP 5.2.6 and greater (confirmed with PHP 5.2.9 and PHP 5.3.0 beta2)


Reproduce code:
---------------
<?php
$i=0; $j=500000;
while($i < 600000) {
	$i++; $j++;
	$data1[] = md5($i);
	$data2[] = md5($j);
}
 
$time = microtime(true);

echo "Starting array_diff\n";
$data_diff1 = array_diff($data1, $data2);

$time = microtime(true) - $time;

echo 'array_diff() took ' . number_format($time, 3) . ' seconds and returned ' . count($data_diff1) . " entries\n";
?>

Expected result:
----------------
Starting array_diff
array_diff() took 3.778 seconds and returned 500000 entries

Actual result:
--------------
Starting array_diff
array_diff() took 14826.278 seconds and returned 500000 entries

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-03-24 21:19 UTC] cisa at cisa85 dot de
Like I described [1] I use this function to get the performance I need:


function array_diff_fast($data1, $data2) {
    $data1 = array_flip($data1);
    $data2 = array_flip($data2);

    foreach($data2 as $hash => $key) {
       if (isset($data1[$hash])) unset($data1[$hash]);
    }

    return array_flip($data1);
}

Thanks to Viper for his help.

[1] http://nohostname.de/blog/2009/03/24/bug-gefunden-array_diff-in-php-526-unglaublich-langsam/
 [2009-06-30 15:19 UTC] viper7 at viper-7 dot com
I've tracked down the change that broke things, this is it. but the exact reason is beyond me heh. Hopefully this helps.

http://cvs.php.net/viewvc.cgi/php-src/ext/standard/array.c?r1=1.308.2.21.2.51&r2=1.308.2.21.2.52&pathrev=PHP_5_2
 [2009-06-30 15:22 UTC] derick@php.net
Dmitry, could you have a look? I have no idea why this occurs.
 [2009-07-01 15:32 UTC] dmitry@php.net
The problems occurs because of "bad" patch for bug #42838.

The diff algorithm sorts arrays using qsort and then assumes that they are sorted correctly. But in case of user compaison function it can't be guaranteed. Thus in ext/standard/tests/array/bug42838.phpt key_compare_func() can't sort array correctly because expressions (0 < 'a') and (0 > 'a') both false ('a' is interpreted as a number 0).

It should be fixed in some way
 [2009-07-09 20:38 UTC] jani@php.net
As Dmitry's noted, this is side-effect your fix caused.
 [2010-01-17 12:09 UTC] emiel dot bruijntjes at copernica dot com
This bug is now open for 10 months. Are you still working on this?
 [2010-02-17 20:53 UTC] maarten at talkin dot nl
Why dont you only reset ptr if (behavior & DIFF_ASSOC) ?
 [2010-04-16 22:20 UTC] sylvain at jamendo dot com
I would also appreciate a patch, this issue made our servers crash after a php 5.3 
upgrade :-/

thanks!
 [2010-08-04 05:21 UTC] lonnyk at gmail dot com
I feel as though the actual bug here is the fix that caused this issue.  If you 
revert the fix and typecast the variables passed into the custom compare function 
as (string) then this works fine.  This is in line with other non-user defined 
comparison functions, they compare as === and not ==
 [2010-11-01 18:16 UTC] felipe@php.net
Automatic comment from SVN on behalf of felipe
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=305011
Log: - Fixed bug #47643 (array_diff() takes over 3000 times longer than php 5.2.4)
 [2010-11-01 18:18 UTC] felipe@php.net
-Status: Assigned +Status: Closed
 [2010-11-01 18:18 UTC] felipe@php.net
This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 [2010-11-11 22:25 UTC] felipe@php.net
Automatic comment from SVN on behalf of felipe
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=305279
Log: - Fixed bug #47643 (array_diff() takes over 3000 times longer than php 5.2.4)
 [2011-02-23 14:56 UTC] jaromir dot dolecek at skype dot net
Looking at the fix, the same problem seems to be possible to happen with 
DIFF_ASSOC option.
 [2014-11-28 14:39 UTC] samantha at adrichem dot nu
Could this fix be the reason why (since i can't find anything else in the changelog) array_diff() now (php 5.6) does string comparisons and no longer supports multidimensional arrays, whilst in php 5.3.23 it does? (though documentation says it doesn't)
 [2014-11-28 14:47 UTC] samantha at adrichem dot nu
Never mind, it just didn't generate a notice array to string conversion, now it does
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 10:01:30 2024 UTC