PHP Bugs  
php.net | support | documentation | report a bug | advanced search | search howto | statistics | login

go to bug id or search bugs for  

Bug #40846 pcre.backtrack_limit too restrictive
Submitted:17 Mar 2007 7:19pm UTC Modified: 20 May 2007 11:22am UTC
From:crisp at xs4all dot nl Assigned to:
Status:Open Category:Feature/Change Request
Version:5.2.1 OS:all
Votes:57 Avg. Score:4.7 ± 0.6 Reproduced:54 of 54 (100.0%)
Same Version:29 (53.7%) Same OS:40 (74.1%)
View/Vote Add Comment Developer Edit Submission

Have you experienced this issue?
Rate the importance of this bug to you:

[17 Mar 2007 7:19pm UTC] crisp at xs4all dot nl
Description:
------------
The new pcre.backtrack_limit configuration directive is by default too
restrictive (100.000) which results in failure of many - often quite
simple - regular expressions.

I take it that this directive overrules the default setting for
MATCH_LIMIT in PCRE which will also imply that the naming of this
directive is wrong since MATCH_LIMIT is for every match() call in PCRE,
not only those for backtracking.

It would make more sense to change the default of both
pcre.backtrack_limit and pcre.recursion_limit to the ones that PCRE
itself also supplies (10.000.000 for both) - that way there won't be a
compatibility problem with previous versions of PHP as we have now.

Reproduce code:
---------------
$a = 'baab' . str_repeat('a', 100024);
$b = preg_replace('/b.*b/', '', $a);

Expected result:
----------------
I would expect $b to contain 100024 times 'a'

Actual result:
--------------
$b is a nullpointer, preg_last_error() reports '2' which is
PREG_BACKTRACK_LIMIT_ERROR
[17 Mar 2007 7:26pm UTC] tony2001@php.net
>that way there won't be a compatibility problem with previous
>versions of PHP as we have now.
Changing the limit doesn't mean removing the limit.
[17 Mar 2007 8:16pm UTC] crisp at xs4all dot nl
>Changing the limit doesn't mean removing the limit.
But if you change the default limits to match the defaults limits set in
PCRE internally you won't affect it's behavior compared to previous
versions of PHP where the internal settings in PCRE were not
overridden.

Either that or don't override PCRE's internal settings unless these
directives are explicitly set and enabled in php.ini (at the moment
these directives are commented in the php.ini samples).
[19 May 2007 6:49am UTC] tigr at mail15 dot com
For me this new behaviour have broken my templates system. While some of
regexpes where simplified, others could not be done so. In some
situations increasing these numbers of little help. For instance(the
regexp was simplified greatly, in real-life application it is much more
complex):

<?php

echo preg_replace('/\$[a-z]+([a-z]*(?:\[[a-z]*\])?)*/i'
, 'replaced', '$abc $something[something]');

echo var_dump(preg_last_error());

?>

Expected result - 'replaced replaced replaced'. Actual result - nothing,
NULL returned, preg_last_error() shows that there is
PREG_BACKTRACK_LIMIT_ERROR error. Also increasing backtrack limit leads
to another error, PREG_RECURSION_LIMIT_ERROR. Increasing recursion limit
leads to php hanging up.

Changing first or second asterisk in pattern to plus sign immediately
fixes the problem, but I need it in this way. Also, do you think that
this is a correct behaviour? I thing there is a bug somewhere that way.

However, this works pretty well on php 4.x, 5.x and even at 5.2.1 (at
one of the hosts), but it does not work on my local php5.2.2 on
WinXPsp2.
[19 May 2007 8:32am UTC] tigr at mail15 dot com
Sorry, little mistake: expected result not 'replaced replaced replaced',
but 'replaced replaced'.
[20 May 2007 10:49am UTC] nlopess@php.net
we simply can't increase recursion limit or we risk segfaulting php.
increase the backtrack limit is also risky, but is much safer (although
regexes with much backtracking are usually not well written). I'll think
more about this..
[20 May 2007 11:09am UTC] tigr at mail15 dot com
It is kinda strange: previous versions work pretty nice, swiftly
executing all patterns. And in some situations (as I mentioned before)
increasing recursion and backtrack limits just won't help. I suppose
it's wrong behaviour.

Also, note that examples are pretty short and simple. Increasing both
limits to 1 000 000 does not help - just why?
[20 May 2007 11:22am UTC] crisp at xs4all dot nl
PHP 5.2.0 includes an update of the PCRE library (version 6.7), so some
problems may not be totally due to the restrictive limits of the PCRE
settings in PHP but could be a bug/regression in PCRE itself.

PCRE has always been very poor in internal optimisation of expressions
that contain look-aheads or look-behinds, especially when they are
combined with some OR'ed subexpression. It's backtracking mechanism is
quite simplistic and doesn't rule out execution paths that are sure not
to result in a match - in fact, it doesn't have any sort of execution
planner.
[16 Aug 2007 3:58pm UTC] brandon at invisionpower dot com
Installations of 5.2 are causing this issue for us with relatively
simple regex.  Users can submit an arbitrary amount of data (I work on a
bulletin board software) that is parsed out for bbcode tags.  Even
simple expressions are causing problems.

			$txt = preg_replace_callback( "#\[code\](.+?)\[/code\]#is", array(
&$this, 'regex_code_tag' ), $txt );

var_dump( preg_last_error() );

The callback function is not being hit at all and the output is int(2)
(backtrack limit hit).  Increasing backtrack limit to 1,000,000
"resolves" the issue, but with shared hosting plans it's unrealistic to
expect hosts to make php.ini changes to allow this.

I agree with crisp - the limit in PHP should bet set to the internal
PCRE options, with the php.ini settings allowing you to reduce them if
you wish.  PHP should not arbitrarily reduce them.
[16 Aug 2007 7:00pm UTC] drnick at physics dot byu dot edu
I just wanted to throw out that I completely agree with crisp.  We
recently updated PHP on our webserver to 5.2.3 and had issues with our
template system on input sizes of a certain size (>100K).

The idea of allowing PHP to enforce limits on the PCRE library is fine,
but having the default action (when recursion_limit and backtrack_limit
are commented-out) be PHP overriding PCRE's internal limits is VERY
problematic.  Aside from being incredibly anti backwards-compatible, I
believe PHP should not make arbitrary assumptions such as these.

I believe PHP should be changed so that the default action is to make
use of PCRE's internal limits, and if people want to enforce their own,
they can make that decision via the options. Perhaps modify
php.ini-recommended to explain the options and say why having external
limits can be good.
[10 Dec 2007 2:18pm UTC] daan at react dot nl
This issue is still a problem, plus this low setting is also the cause
of segfaults.
(see http://bugs.php.net/bug.php?id=43031)

At the moment even this simple regexp segfaults 5.2.5:
preg_match('/(.)*/', str_repeat('x', 6000));

I hope that is not intended behavior, as is suggested in the reply in
bug report 43031.
[28 Aug 2009 8:54am UTC] tom at r dot je
This is still an issue in 5.3.

The low limit causes scripts which hit it to fail without a warning,
notice or error, creating hard to track down bugs For example, something
which works fine for one set of data stops working on another set
because the data is just longer.

This cannot be the expected behaviour, surely?

At minimum there needs to be a warning. Ideally though, the limit needs
to be put to the pcre defaults.

RSS feed | show source 

PHP Copyright © 2001-2009 The PHP Group
All rights reserved.
Last updated: Sat Nov 21 10:30:49 2009 UTC