php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #64518 optimisation for "for ($i=0; $i<count($array);$i++)" when $array is global
Submitted: 2013-03-26 05:17 UTC Modified: 2013-03-26 21:04 UTC
From: php at richardneill dot org Assigned:
Status: Not a bug Package: Performance problem
PHP Version: 5.4.13 OS:
Private report: No CVE-ID: None
 [2013-03-26 05:17 UTC] php at richardneill dot org
Description:
------------
If an array is counted inside a for-loop, this is normally slightly inefficient.

$n = count ($array);                  //[1] optimised and a bit faster.
for ($i=0; $i< $n;$i++){
  //do stuff
}   

for ($i=0; $i<count($array);$i++){    //[2] perfectly respectable speed.
  //do stuff                          //very nearly as fast as [1].
}   



BUT, if our for-loop is inside a function AND our $array is a global variable, 
then method [2] becomes really really slow (though method [1] remains 
unaffected).

Below, the problematic function is add_slow(); the script does the same thing 4 
ways to demonstrate that 3 of them work well and one works slowly.

Test script:
---------------
#!/usr/bin/php  -ddisplay_errors=E_ALL
<?
echo "This demonstrates the problem with counting global arrays inside loops inside functions.\n";
$NUMBER = 10000; //adjust for your computer.
for ($i = 0; $i< $NUMBER; $i++){
	$data[$i] = "$i";
}

function add_slow(){	//This is the problematic function. $data is global, and the count() is inside the for().
	global $data;	//It is REALLY slow: 4000 times slower than the others!
	$sum=0;
	for ($i = 0; $i < count($data); $i++){  
		$sum += $data[$i];
	}
	echo "Total (slow) is $sum.\n";
}

function add_fast(){	//This one is fine. The count() is optimised by taking it out of the for().
	global $data;   //... but we're still using a global array, so that's not the problem.
	$sum=0;
	$n = count($data);			
	for ($i = 0; $i < $n; $i++){
		$sum += $data[$i];
	}
	echo "Total (fast) is $sum.\n";
}

function add_local(){  //This one is also fine. The count() is NOT optimised, but it still runs almost as quickly.
	global $NUMBER;
	for ($i = 0; $i< $NUMBER; $i++){
		$data[$i] = "$i";
	}
	$sum=0;
	for ($i = 0; $i < count($data); $i++){   
		$sum += $data[$i];
	}
	echo "Total (local) is $sum.\n";
}	
	
echo "Calling add_slow()...\n";
$t1 = microtime(true); 
add_slow();
$t2 = microtime(true);
$time = round($t2 - $t1, 3);
echo "Finished in $time seconds.\n\n";  

echo "Calling add_fast()...\n";
$t1 = microtime(true);
add_fast();
$t2 = microtime(true);
$time = round($t2 - $t1, 3);
echo "Finished in $time seconds.\n\n";

echo "Calling add_local()...\n";
$t1 = microtime(true);
add_local();
$t2 = microtime(true);
$time = round($t2 - $t1, 3);
echo "Finished in $time seconds.\n\n";

echo "Not using a function...\n";
$t1 = microtime(true);
$sum=0;
for ($i = 0; $i < count($data); $i++){   //This one is in the main() scope, it's also fast.
	$sum += $data[$i];
}
echo "Total (main loop) is $sum.\n";
$t2 = microtime(true);
$time = round($t2 - $t1, 3);
echo "Finished in $time seconds.\n\n";
?>

Expected result:
----------------
All 4 ways of summing this array should run in similar amounts of time (a few 
milliseconds). 

But actually add_slow() is 3900 times slower. 


Actual result:
--------------
This demonstrates the problem with counting global arrays inside loops inside 
functions.
Calling add_slow()...
Total (slow) is 49995000.
Finished in 7.86 seconds.

Calling add_fast()...
Total (fast) is 49995000.
Finished in 0.002 seconds.

Calling add_local()...
Total (local) is 49995000.
Finished in 0.006 seconds.

Not using a function...
Total (main loop) is 49995000.
Finished in 0.002 seconds.



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-03-26 06:09 UTC] php at richardneill dot org
Here's a simpler test case. The problem is actually that accessing global 
variables from functions is much slower than local variables (and I was seeing 
this exacerbated by looping over them with count).

Note that passing the data directly as a function argument is much faster than 
passing it as a global variable - this seems to me to be the root of the 
problem. i.e. 

function ($a,$b,$c,$large_array){
   //do stuff  - this is fast.
}

function ($a,$b,$c){
   global $large_array
   //do stuff  - this is SLOW.
}




--------------------------------------------
#!/usr/bin/php  -ddisplay_errors=E_ALL
<?
$NUMBER = 100000; //adjust for your computer.
for ($i = 0; $i< $NUMBER; $i++){
	$data[$i] = "$i";
}

echo "Copy in main()...\n";
$t1 = microtime(true);
$copy = $data;
$t2 = microtime(true);
$time = ($t2 - $t1) * 1E6;
echo "Finished in $time microseconds.\n\n";	#5.0 us


function f1(){
	global $NUMBER;
	echo "Copy local variable in function...\n";
	for ($i = 0; $i< $NUMBER; $i++){
		$data_loc[$i] = "$i";
	}
	$t1 = microtime(true);
	$copy =  $data_loc;
	$t2 = microtime(true);
	$time = ($t2 - $t1) *1E6;
	echo "Finished in $time microseconds.\n\n"; #1.9 us
}

function f2(){
	global $data;
	echo "Copy global variable into function...\n";
	$t1 = microtime(true);
	$n = $data;
	$t2 = microtime(true);
	$time = ($t2 - $t1) *1E6;
	echo "Finished in $time microseconds.\n\n"; #9557 us
}

function f3($data){
	echo "Pass variable into function...\n";
	$t1 = microtime(true);
	$n = $data;
	$t2 = microtime(true);
	$time = ($t2 - $t1) *1E6;
	echo "Finished in $time microseconds.\n\n"; #0.95 us
}

f1();
f2();
f3($data);

echo "Note that as \$NUMBER is changed by factors of 10, the first two times 
don't change much, but the latter scales as O(\$NUMBER).\n";

?>
 [2013-03-26 06:34 UTC] php at richardneill dot org
... and getting closer to the source of the problem...

function a(){      //about 1400 us
	global $data;
	$n = $data;
}

function b(){      //much much quicker: about 18us
	$n = $GLOBALS['data'];
}


Something is definitely wrong here, that makes "global" so much slower than 
$GLOBALS.



Also, this isn't a read-only optimisation which makes b() much quicker than a(); 
it still applies even when we want to copy back into the global scope. 
Consider:

function x(){
     $n = $GLOBALS['data'];
     $GLOBALS['data'] = count($n);
}

and x() is always about as fast as b().


A secondary problem: if x() is rewritten as:

function y(){
     $GLOBALS['data'] = count($GLOBALS['data']);
}

then it can be either fast or slow depending on whether other function calls 
contain "global $data" or not.
 [2013-03-26 13:34 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2013-03-26 13:34 UTC] nikic@php.net
The issue you are seeing is that `global $data` is a by-reference fetch (unlike `$data = $GLOBALS['data']`, which is by-value). The by-ref pass prevents us from doing a copy-on-write optimization when passing the array to the by-value count() function, so the array has to be copied every single time.

The specific count() case is something we could optimize by adding a read-only passing mode for function arguments (that never separates zvals). As that is something that's also useful for a few other things, it might well be worth doing.
 [2013-03-26 15:15 UTC] php at richardneill dot org
> The issue you are seeing is that `global $data` is a by-reference fetch 
> (unlike `$data = $GLOBALS['data']`, which is by-value). The by-ref pass 
> prevents us from doing a copy-on-write optimization when passing the array to 
> the by-value count() function, so the array has to be copied every single time.

Thanks for the explanation. That makes some sense as an explanation. However, it also implies that use of "$GLOBALS['var']" is always faster, sometimes much faster, than "global $var".  If this is true, might I suggest that this is a documentation issue that should be addressed?
 [2013-03-26 15:57 UTC] nikic@php.net
It's not always faster. Which one is faster is not very predictable without good knowledge of the engine. We don't address such low-level details in the manual.
 [2013-03-26 16:55 UTC] php at richardneill dot org
I see your point about optimisations being too low-level, but some of these things are hugely different in performance. Try running the following script:
http://www.richardneill.org/tmp/php-perf.txt

I notice 3 things. 
1) Updating a variable in-place is 1000x slower than creating a new one.
   Fast:   $n = count($data);
   Slow:    $data = count($data);

2) Using $GLOBALS['var'] inside a function is very similar to using the same variable in the main scope. Using "global $var" is always at least a bit slower.

3) Using global for just reading a variable is 100x slower than reading it from $GLOBALS inside a function.
   Fast:   $n = count($GLOBALS['data'])
   Slow:   global $data; $n = count ($data)

I think that (3) might be a bug. 

I also think that these differences are far from negligible, and so PHP's documentation *should* mention them.
 [2013-03-26 21:04 UTC] sixd@php.net
You can contribute documentation updates at https://edit.php.net. If you don't have an account, use "anonymous" in the "VCS login" field.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 15:01:29 2024 UTC