php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #19499 Populating a large array of arrays doesn't scale linearly
Submitted: 2002-09-19 08:36 UTC Modified: 2002-09-26 18:06 UTC
From: olivierS dot dagenaisP at canadaA dot comM Assigned:
Status: Not a bug Package: Performance problem
PHP Version: 4.2.3 OS: WinME
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: olivierS dot dagenaisP at canadaA dot comM
New email:
PHP Version: OS:

 

 [2002-09-19 08:36 UTC] olivierS dot dagenaisP at canadaA dot comM
Steps to Reproduce:
0 - Standard PHP 4.2.3 installed using the installer.
1 - Create an array of strings using something like explode on a CSV line
2 - Add that array to another array
3 - Repeat steps 1 and 2 1000, 2000, 3000, 4000, 5000 and 6000 times

Example snippet (short):
$times = 6000;   /* change this to 1000-6000 */
$line = "'x','s','n','f','n','a','c','b','y','e',?,'s','s','o','o','p','o','o','p','o','c','l','e'";
$storage = array ( );
for ( $a = 0; $a < $times; $a++ )
{
  $bits = explode ( ",", $line );
  $storage[] = $bits;
}


Expected Behaviour:
The time it takes to do perform this is linearly proportional to the number of iterations it performs.


Actual Behaviour:
The results of running (on a Celeron 800) are as follows:
1000 - 2.174001 seconds
2000 - 22.422081 seconds
3000 - 74.593858 seconds
4000 - 148.223771 seconds
5000 - 254.329387 seconds
6000 - 371.621738 seconds

The time it takes pretty much doubles for every increase of 1000.


The funny thing is I was able to do similar operations independently with linear time.  For example, a slight modification to the script:

$times = 6000;
$line = "'x','s','n','f','n','a','c','b','y','e',?,'s','s','o','o','p','o','o','p','o','c','l','e'";
$storage = array ( );
$otherBits = explode ( ",", $line );
for ( $a = 0; $a < $times; $a++ )
{
  $bits = explode ( ",", $line );
  $storage[] = $otherBits;
}


...and it runs in less than half of a second, even at $times = 6000.  So I know I can add arrays to an array.




A much longer repro script (that demonstrates that these operations work fine independently in various combinations) follows:

------start long repro script----------
<pre>
<?php
  $trace = true;
  $start_time = 0;
  $times = 6000;

function getmicrotime ( )
{
  list ( $usec, $sec ) = explode ( " ", microtime () );
  return ( (float) $usec + (float) $sec );
}

function start ( $message )
{
  global $trace, $start_time;
  $start_time = getmicrotime ( );
  if ( $trace )
  {
    printf ( "%s... ", $message );
    flush ( );
  }
}


function stop ( )
{
  global $trace, $start_time;
  if ( $trace )
  {
    $current_time = getmicrotime ( );
    $runningTime = $current_time - $start_time;
    printf ( "%.6f seconds\n", $runningTime );
    flush ( );
  }
}

  set_time_limit ( 0 );
  error_reporting ( E_ALL );

  // allocate some memory so as to not bias the first result
  $storage = array ( );
  $line = "'x','s','n','f','n','a','c','b','y','e',?,'s','s','o','o','p','o','o','p','o','c','l','e'";
  $otherBits = explode ( ",", $line );
  for ( $a = 0; $a < $times; $a++ )
  {
    $storage[] = $line;
  }


  start ( "A: Adding $times times the same string to an array" );
  $storage = array ( );
  for ( $a = 0; $a < $times; $a++ )
  {
    $storage[] = $line;
  }
  stop ( );


  start ( "B: Adding $times times the same array to an array" );
  $storage = array ( );
  for ( $a = 0; $a < $times; $a++ )
  {
    $storage[] = $otherBits;
  }
  stop ( );


  start ( "C: Creating $times different arrays" );
  for ( $a = 0; $a < $times; $a++ )
  {
    $bits = explode ( ",", $line );
  }
  stop ( );


  start ( "D: Tests A and C" );
  $storage = array ( );
  for ( $a = 0; $a < $times; $a++ )
  {
    $storage[] = $line;
    $bits = explode ( ",", $line );
  }
  stop ( );


  start ( "E: Tests A and B" );
  $storage = array ( );
  for ( $a = 0; $a < $times; $a++ )
  {
    $storage[] = $line;
    $bits = explode ( ",", $line );
  }
  stop ( );

  start ( "F: Tests B and C with different arrays" );
  $storage = array ( );
  for ( $a = 0; $a < $times; $a++ )
  {
    $bits = explode ( ",", $line );
    $storage[] = $otherBits;
  }
  stop ( );


  start ( "G: Tests B and C with the array just created" );
  $storage = array ( );
  for ( $a = 0; $a < $times; $a++ )
  {
    $bits = explode ( ",", $line );
    $storage[] = $bits;
  }
  stop ( );

?>
</pre>
------end long repro script------------


Test "G" in the long repro script is the one that takes "forever" compared to the other operations.  Note that even though the same $line is exploded on every iteration, I am expecting these to be different (read from a file) in every iteration, hence the need to explode inside the loop.


I also checked the bug database.  Unlike http://bugs.php.net/bug.php?id=13598 I am not concerned with how much memory my script takes.  And I don't think that I am experiencing the same problem as http://bugs.php.net/bug.php?id=6333 because I *am* able to create an array of arrays the same size. (ie: test "F" in the long repro script)  I didn't find any other bugs that were similar to mine, although I might have missed something (sorry) because I wasn't sure how to express the problem.


Work-arounds appreciated (I tried pre-allocating with array_fill - no avail), but a fix would even be better.

Good luck and thanks in advance!

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-09-19 18:37 UTC] olivierS dot dagenaisP at canadaA dot comM
Here is a somewhat more useful and probably more accurate repro script that will show how much time we have spent (so far) every 125 iterations.  It avoids the de-allocation time penalty we get with the long repro script included in the last contribution.  It also allows for easier plotting in your favorite spreadsheet software.  (just adjust the print function to output the numbers, copy and paste them into the spreadsheet)


<?php
function getmicrotime ( )
{
  list ( $usec, $sec ) = explode ( " ", microtime () );
  return ( (float) $usec + (float) $sec );
}

set_time_limit ( 0 );

$line = "'x','s','n','f','n','a','c','b','y','e',?,'s','s','o','o','p','o','o','p','o','c','l','e'";

$storage = array ( );
$start_time = getmicrotime ( );
for ( $times = 0; $times < 8000; $times += 125,
  print ( "$times = " . ( getmicrotime ( ) - $start_time ) . "\n" ) )
{
  for ( $a = 0; $a < 125; $a++ )
  {
    $storage[] = explode ( ",", $line );
  }
}

?>



I ran this new script on an Athlon 900 machine running Windows 98 and got the following results:

125 = 0.012104988098145
250 = 0.036535978317261
375 = 0.088302969932556
500 = 0.19990205764771
625 = 0.37106597423553
750 = 0.6065239906311
875 = 0.90365397930145
1000 = 1.253583073616
1125 = 1.6664990186691
1250 = 2.1331380605698
1375 = 2.6527270078659
1500 = 3.2355539798737
1625 = 3.8985589742661
1750 = 4.68299305439
1875 = 5.6240310668945
2000 = 6.781278014183
2125 = 8.2345640659332
2250 = 10.012183070183
2375 = 12.181863069534
2500 = 14.771154046059
2625 = 17.770683050156
2750 = 21.163032054901
2875 = 24.917178034782
3000 = 28.996559023857
3125 = 33.361265063286
3250 = 37.975730061531
3375 = 42.820783019066
3500 = 47.88387298584
3625 = 53.16420006752
3750 = 58.659978985786
3875 = 64.360311985016
4000 = 70.265311002731
4125 = 76.383864998817
4250 = 82.977863073349
4375 = 90.194562077522
4500 = 97.048099994659
4625 = 103.93731307983
4750 = 111.02486097813
4875 = 118.31274199486
5000 = 125.80013000965
5125 = 133.48784804344
5250 = 141.95425903797
5375 = 152.91942405701
5500 = 161.20297205448
5625 = 169.70409607887
5750 = 178.36672604084
5875 = 187.21697604656
6000 = 196.26174402237
6125 = 205.5024510622
6250 = 214.93042397499
6375 = 224.54999697208
6500 = 234.35991597176
6625 = 244.3618350029
6750 = 254.5541960001
6875 = 264.93612003326
7000 = 275.82330298424
7125 = 286.85267901421
7250 = 298.12616407871
7375 = 309.47215306759
7500 = 320.87820506096
7625 = 332.56023800373
7750 = 344.28222107887
7875 = 356.19291603565
8000 = 368.28939902782

HTH,
Oli
 [2002-09-26 18:06 UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Well, there is a very good reason when the 2nd example shows linear increase in time. The reason being that you are making a copy on write of $otherBits array, which means unless you modify that data all it will contain, is a pointer to a single $otherBits array. This means that you've created n times pointers to a single array.
The 1st example actually causes the creation of n times arrays and their subsequent assignment into storage array. Meaning that you are doing using up a lot more memory. Given the fact Window's memory menager is not the most effecient thing aroung and you may be going into swap you do not get a linear time increase.

Just as point of reference, the 1st test caused 35.5 megs of memory to be allocated, while the 2nd only caused 5.1 megs to be allocated.
 [2002-11-01 17:12 UTC] olivierS dot dagenaisP at canadaA dot comM
So I assume you get linear time on non-Windows machines?  I don't have a Linux machine handy, but you're saying allocating 35 MB should be trivial on that platform???
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 05:01:29 2024 UTC