php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40261 Extremely slow data handling due to memory fragmentation
Submitted: 2007-01-28 00:30 UTC Modified: 2007-03-20 07:14 UTC
Votes:6
Avg. Score:4.7 ± 0.7
Reproduced:6 of 6 (100.0%)
Same Version:4 (66.7%)
Same OS:5 (83.3%)
From: thuejk at gmail dot com Assigned: dmitry (profile)
Status: Closed Package: Performance problem
PHP Version: 5.2.0 OS: All
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: thuejk at gmail dot com
New email:
PHP Version: OS:

 

 [2007-01-28 00:30 UTC] thuejk at gmail dot com
Description:
------------
I have some code which produces unacceptible performance in a specific situation. Making a completely trivial change improves performance a hundred-fold or more.

The problem is 100% reproducible.

The code looks something like:

/* pseudocode*/
function get_data_from_pgsql(){
  ...
  $map = Array();
  foreach ($rows as $row) {
     $map = row[index];
  }
  return $map;
}
$data1 = get_data_from_pgsql
$data2 = get_data_from_pgsql

One run with 10000 rows of result took 8.77 seconds, which is clearly silly. Making the extremely trivial change of moving the code block 
  foreach ($rows as $row) {
     $map = row[index];
  }
out of the function get_data_from_pgsql(), the code suddently only takes 0.1 seconds to run (factor 90)! Having more rows in the result makes the factor difference larger; it seems to increase quadratically.

Not saving the return values in $data1 and $data2 also improves performance immensely.

PHP 5.1.5 did not have this problem (or at least it was much smaller).

Reproduce code:
---------------
http://thuejk.dk/test.php.txt

You need a running postgresql database with one table, which has at least 10000 entries.

change the line
  $translate_outside_function = false;
at the top of the file to see the difference mentioned above.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-01-28 01:15 UTC] tony2001@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.


 [2007-01-28 01:25 UTC] thuejk at gmail dot com
I added the requested changes to the linked file http://thuejk.dk/test.php.txt
 [2007-01-28 01:30 UTC] tony2001@php.net
Could please modify the script, so that it would be _short_ but complete?
 [2007-01-28 01:47 UTC] thuejk at gmail dot com
I have made it a bit shorter.

I left the DebugStats class in, it is useful. Just treat it as a black box. I can garantie that the two calls into it does not account for the 8 seconds performance problem...
 [2007-01-28 01:52 UTC] thuejk at gmail dot com
On second though, black boxes are not good for testing. A simpler timer added...
 [2007-01-29 01:42 UTC] thuejk at gmail dot com
Ok, after a good deal of work I actually tracked down the problem to fragmentation in your memory management.

This understanding enabled me to construct a simpler test case. Note that the problem is unrelated to database function, and this new testcase does not require any database.

<?php

//According to my calculations, setting $num=18000000 should make this script take ~one year.
$num = 100000;

$a = Array();
for ($i=0; $i<$num; $i++) {
  $a[$i] = Array(1);
}
/* The underlying memory now looks something like
 *  row structure
 *  row value
 *  row structure
 *  ...
 */

$b = Array();
for ($i=0; $i<$num; $i++) {
  $b[$i] = $a[$i][0];
}
/* The b rows are allocated
 *
 * Though I haven't checked it in the code, I have reason to believe
 * that the values inserted into the b rows are pointers to the same
 * memory allocated for the values in the a rows
 */

unset($a);
/* The a rows are unallocated, but the values are still references from $b, so the memory map looks something like
 *  row value
 *  free_block
 *  row_value
 *  free_block
 * ...
 *
 * repeated $num times.
 */

/* Now, for each memory allocation for a row, PHP runs through all
 * free blocks checking for a best fit. Since there are $num free
 * blocks, this takes time. This is done by the function
 * _zend_mm_alloc_int in PHP 5.2.0 Zend/zend_alloc.c :
 *
 *  end = &heap->free_buckets[0];
 *  for (p = end->next_free_block; p != end; p = p->next_free_block) {
 *    size_t s = ZEND_MM_FREE_BLOCK_SIZE(p);
 *    if (s > true_size) {
 *      if (s < best_size) {    // better fit
 *        best_fit = p;
 *        best_size = s;
 *      }
 *    } else if (s == true_size) {
 *      // Found "big" free block of exactly the same size
 *      best_fit = p;
 *       goto zend_mm_finished_searching_for_block;
 *    }
 *  }
 */
$c = Array();
for ($i=0; $i<$num; $i++) {
  $c[$i] = 1;
}

?>
 [2007-01-29 13:24 UTC] thuejk at gmail dot com
I added a few printf's and found out that the $num generated "holes" in the memory are 384 big.

Zend has a special system for small allocations, but that only works for holes below ZEND_MM_SMALL_SIZE=280.

The $num allocations at the end, which cause the problems, and 40 or 88 long.

Note that these numbers are from a 64-bit machine, which of course has 8-byte pointers, and so a larger MM overhead. (The bug does also occur on 32bit machines)

One temporary solution could be to raise
#define ZEND_MM_NUM_BUCKETS 32
from which ZEND_MM_SMALL_SIZE is defined, so that ZEND_MM_SMALL_SIZE is made twice or perhaps 4 times as big. (I got a segfault when I tried that, haven't looked into why)

A better solution would be to organize the free blocks in a balanced tree, instead of a linear list which has to be traversed.
 [2007-01-30 14:03 UTC] dmitry@php.net
Please try PHP_5_2 snapshot. It already uses litle bit different "best-fit" implementation and this script takes reasonable time (near the same as 5.1).
 [2007-01-30 14:53 UTC] thuejk at gmail dot com
The latest PHP snapshot does fix my example, and probably makes my production code work.

This will probably fix the problem in far most cases. But it is possible to construct an example which still have the problematic behavior. One example is below.

<?php

$num = 100000;

$a = Array();
for ($i=0; $i<$num; $i++) {
  $a[$i] = Array(1);
}

for ($i=0; $i<$num; $i++) {
  $b[$i] = $a[$i][0];
}

unset($a);
for ($i=0; $i<$num; $i++) {
  $b[$i] = "1234567890123456789012345678901234567890123456789012345678901234567\
8901234567890123456789012345678901234567890123456789012345678901234567890123456\
7890123456789012345678901234567890123456789012345678901234567890123456789012345\
6789012345678901234567890123456789012345678901234567890123456789012345678901234\
5678901234567890";
}

?>
 [2007-03-16 12:09 UTC] thuejk at gmail dot com
I think I am hitting this in practice, or something like it.

I have a function call

<?
function lala() {
  [database access with lots of data]
  echo "returning " . time();
  return;
}
lala();
echo "returned ".time();
?>

And I can see that for some reason, the time between "returning" and "returned" is 60 seconds! This only happens the first time this function is called, for some reason. Installing php 5.1.6 it returns instantaneously.

I liked thet PHP 5.1 memory allocator better :(. The PHP 5.1 memory allocator was also 1/4 the size of the PHP 5.2 memory allocator.
 [2007-03-16 13:02 UTC] bloire at citytoo dot com
I have exactly the same probleme with foreach with big rows from mysql and reaffection for each field for each row

After that, all processing is verry verry verry VERRY slow. It's HORRIBLE. It's verry strange because I didn"t have this probleme with php < 5.2

I hope that will be repair because now, I'm not trusting with php 5.2

Thank you for all
 [2007-03-16 13:35 UTC] bloire at citytoo dot com
My version is 5.2.1 the lastest with mandake limited edition 2005 and fedora core 4

The probleme exist always with php 5.2.1 !!!!!!!!!!! The latest release !


Probleme looks like :

function test(&$b){

	//sql Query...
	//$a_result_mysql

	//No problem with performance with php 5.2.1  (6 secondes)
	foreach($a_result_mysql as $i=> $a_row) {
		$b[artist][item][$i] = $a_row;
	}
	
	//Very very very very very very very very very slow with php 5.2 (180 secondes)
	//normal with php < 5.2  (10 secondes)
	foreach($a_result_mysql as $i=> $a_row) {
		$b[artist][item][$i][name] = $a_row[name];
		$b[artist][item][$i][firstname] = $a_row[firstname];
		$b[artist][item][$i][address] = $a_row[address];
		$b[artist][item][$i][zip] = $a_row[name];
		$b[artist][item][$i][color] = $a_row[color];
	}
}
test($b);
 [2007-03-16 13:57 UTC] bloire at citytoo dot com
Real example ($a_output is a reference with 3000 results from mysql)
The probleme processing with memory is after these differents solutions
Time for processing Solution 1 or 2 is not verry different.
The real probleme is after Solution 1 or 2 for all other processing with memory !

$a_input[idlg] = $a_context[idlg];
node::select_node_no_finalsite($a_input,$a_output);
foreach ($a_output as $i => $a_row) {
	/*
        //Solution 1 : KO because verry verry verry SLOW 
	$a_data[a_formulaire][listenode2][$i][idn] 		 = $a_row[idn]; 
	$a_data[a_formulaire][listenode2][$i][traduction_idn]  = $a_row[traduction_idn];
	$a_data[a_formulaire][listenode2][$i][traduction_idcat]= $a_row[traduction_idcat];
	$a_data[a_formulaire][listenode2][$i][idte_idn]	 = $a_row[idte_idn]; 
	$a_data[a_formulaire][listenode2][$i][idte_idcat]	 = $a_row[idte_idcat];
	$a_data[a_formulaire][listenode2][$i][idcat]		 = $a_row[idcat];
	$a_data[a_formulaire][listenode2][$i][name]   		= $a_row[name];
	$a_data[a_formulaire][listenode2][$i][idsite]   	= $a_row[idsite];
	*/
        
        //Solution 2 : OK :-) 15 milliseconds
	$a_data[a_formulaire][listenode2] = $a_row;
}


//Times here depends choice at top lines !!!   
//Times is  6518 milliseconds if solution 1 KO   :-(
//Times is 16 milliseconds if solution is 2 OK
$instance_time=new time;
$instance_time->execute("test","start");
for($i=0;$i<50000;$i++){
	$a .= "k";
}
//print $a;
$instance_time->execute("test","stop");
print $instance_time->show();


Verry stange behavior for only 50000 concats !!!!!!!!!!!
It's not normal because no probleme before php < 5.2
 [2007-03-20 06:49 UTC] dmitry@php.net
Fixed in CVS HEAD and PHP_5_2.
 [2007-03-20 07:14 UTC] thuejk at gmail dot com
Excellent, thanks :).

I was irritated enough by this bug that I was thinking about implemented my own red-black balanced tree for the free list.

Just curious, what did you do to fix it? I can't see any relevant messages at http://marc.info/?l=php-cvs&r=1&b=200703&w=2 in the near past.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 11:01:29 2024 UTC