php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #50264 Possible pcre memory leak
Submitted: 2009-11-22 18:47 UTC Modified: 2009-11-28 16:52 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: laszlo dot janszky at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.3.1 OS: Windows XP
Private report: No CVE-ID: None
 [2009-11-22 18:47 UTC] laszlo dot janszky at gmail dot com
Description:
------------
I have a huge recursive regex (about 500bytes), which needs a lot of memory for backtrace.

The regex matches on templates like
{command1 arg1=$arg1 arg2=$arg2|modifier2 arg3="text"|modifier3:modarg31:modarg32}
etc....

If I use the regex with preg_match_all, then the backtrace memory usage depends on the count of the commands superexponential. 

So:
 R^2     =   0,9977 (R^2 for trendline)
 ln ln M =   0,0787 * N + 1,9304  
 [M]     =   used backtrack memory in bytes  
 [N]     =   number of command calls  

It don't think that more than 1Mb memory usage is normal for a 0.0002Mb string.

The recursion memory usage is normal(under 1kb). I'm pretty disappointed because I can't use my template engine because of a badly written pcre engine.

Reproduce code:
---------------
$template1='
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
';

$template2='
{display var=$link}
{display var=$link}
{display var=$link}
{display var=$link}
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test 
test test test test test
';

$regex='%\\{(?<function>(?:\\w+))(?:(?<list>\\s(?:[\\w_]+(?:\\s[\\w_]+)*\\s)?(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?)(?:\\|\\w+(?::(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?))*)*(?:\\s[\\w_]+(?:\\s[\\w_]+)*\\s(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?)(?:\\|\\w+(?::(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?))*)*)*(?:\\s[\\w_]+(?:\\s[\\w_]+)*)?)|(?<hash>(?:\\s\\w+=(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?)(?:\\|\\w+(?::(?:\\$\\w+(?:->\\w+|\\.\\w+)*|"(?:.*?)"|\\d+(?:\\.\\d+)?))*)*)*))(?:\\}(?<block>.*?(?:(?0).*?)*?)\\{/(?P=function))?\\}%usD';


$one_Mb=1024*1024;
$one_kb=1024;

ini_set('pcre.backtrack_limit', $one_Mb);
ini_set('pcre.recursion_limit', $one_kb);

preg_match_all($regex,$template1,$matches1,PREG_SET_ORDER);
preg_match_all($regex,$template2,$matches2,PREG_SET_ORDER);


echo 'test1:<br />';
echo (!count($matches1)?'failed':'ok').'<br />';
echo 'test2:<br />';
echo (!count($matches2)?'failed':'ok').'<br />';


Expected result:
----------------
test1:
ok
test2:
ok

Actual result:
--------------
test1:
failed
test2:
ok

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-11-22 18:53 UTC] laszlo dot janszky at gmail dot com
If I remove the recursive part
(?:\\}(?<block>.*?(?:(?0).*?)*?)\\{/(?P=function))?
 from the end of the regex, then it works fine...
 [2009-11-23 19:21 UTC] laszlo dot janszky at gmail dot com
The leak is in relation with this
http://bugs.php.net/bug.php?id=49333


Here is a simplyfied example with eight "withoutBlock" tokens:

<?php

ini_set('pcre.backtrack_limit', 40000);
ini_set('pcre.recursion_limit', 1000);

$pattern=
'%
	{(\w+)(?:}
		(.*?(?:(?0).*?)*?)
	{/\1)?}
%usDx';

$test='
{display}
{display}
{display}
{display}
{display}
{display}
{display}
{display}
';

preg_match_all($pattern,$test,$matches,PREG_SET_ORDER);
var_dump($matches);

?>

The basic syntax is:
  {withBlock}block{/withBlock}
or
  {withoutBlock}


As the {withBlock} opener part is of the same structure like the {withoutBlock}, it starts to collect the string after the {withoutBlock} to the backtrace. But for some kind of reason the {withoutBlock} backtrace eats up the memory superexponential, not linear like in the case of {withBlock}.



A measured the memory usage with the simplyfied example. It was not superexponential, just exponential. I think cause I have in this example two capturing groups only, not a lot like in the original code.

tokens	M1[b]	M2[b]	LN(M2)
1	19	22	3,0910
2	53	115	4,7449
3	87	405	6,0039
4	121	1286	7,1593
5	155	3940	8,2789
6	189	11913	9,3854
7	223	35843	10,4869
8	257	107644	11,6204

M1 = 34 * N - 15
R^2 = 1

M2 = exp ( 1,1192 * N + 2,6669 )	
R^2 = 0,9999 for the 3-8 part

Btw. it's funny memory usage.....................
 [2009-11-23 19:38 UTC] laszlo dot janszky at gmail dot com
If it is not clear, by the test:

the 8 tokens withBlock (M1) test string is:

$test='
{display}
{display}
{display}
{display}
{display}
{display}
{display}
{display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
';

and the 8 tokens withoutBlock (M2) test string is:

$test='
{display}
{display}
{display}
{display}
{display}
{display}
{display}
{display}
';
 [2009-11-27 17:57 UTC] jani@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php


 [2009-11-27 23:07 UTC] laszlo dot janszky at gmail dot com
OK.
If it's not a bug, then what is it?
 [2009-11-27 23:26 UTC] rasmus@php.net
Did you test it with the command line pcre test tool?  This is unlikely 
to have anything to do with php-specific code.
 [2009-11-28 02:37 UTC] laszlo dot janszky at gmail dot com
Ahm, so you mean, this is a pcre memory handling problem, not a php bug? Okay, then where can I report this issue?

(Sorry for the bad English, I'm just med- ...)
 [2009-11-28 02:40 UTC] laszlo dot janszky at gmail dot com
"Did you test it with the command line pcre test tool?"

I did not test it.
 [2009-11-28 16:52 UTC] laszlo dot janszky at gmail dot com
Ok.

For last, here is my backtrace memory tester. If it's not a bug, then bye bye.

<?php
header('content-type: text/html; charset=utf-8');
header("Cache-Control: no-cache, must-revalidate");
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT");


ini_set('pcre.recursion_limit', 10000);

$kb=1024;
$Mb=1024*1024;

$max_memory_size=50*$Mb;

$pattern=
'%
    {(\w+)(?:}
        (.*?(?:(?0).*?)*?)
    {/\1)?}
%usDx';


if (isset($_POST['test']) && $_POST['test']!='')
{
    ini_set('pcre.backtrack_limit', $max_memory_size);
    if (preg_match_all($pattern,$_POST['test'],$m,PREG_SET_ORDER))
    {
        $a=1;
        $b=$max_memory_size;
       
        while (abs($a-$b)>1)
        {
            $c=(int)(($a+$b)/2);
            ini_set('pcre.backtrack_limit', $c);
            if (preg_match_all($pattern,$_POST['test'],$m,PREG_SET_ORDER))
            {
                $b=$c;
            }
            else
            {
                $a=$c;
            }
        }
       
    }
    ?><div style="float: left; margin: 10px; "><?php
    if (isset($c))
    {
        ?>A fogyasztott memória:<br /><?php echo ($c/$kb); ?>.kb.<br /><?php
    }
    else
    {
        ?>A memória fogyasztás túllépte az engedélyezett kvótát, vagy a minta nem illeszkedik.<br /><?php
    }
    ?><br /><br />A tesztelt szöveg: <br /><pre><?php echo $_POST['test']; ?></pre><?php
    ?></div><?php
   
}

?>
<div style="float: left; margin: 10px;">
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" enctype="application/x-www-urlencoded; charset=utf-8">
<input type="submit" value="test1" style="vertical-align: top" />
<textarea id="test1" name="test" rows="18">
1.){display}
2.){display}
3.){display}
4.){display}
5.){display}
6.){display}
7.){display}
8.){display}
9.){display}
</textarea>
</form>
</div>

<div style="float: left; margin: 10px;">
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post" enctype="application/x-www-urlencoded; charset=utf-8">
<input type="submit" value="test2" style="vertical-align: top" />
<textarea id="test2" name="test" rows="18">
1.){display}
2.){display}
3.){display}
4.){display}
5.){display}
6.){display}
7.){display}
8.){display}
9.){display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
{/display}
</textarea>
</form>
</div>
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 31 15:01:34 2024 UTC