| Bug #40846 | pcre.backtrack_limit too restrictive | ||||
|---|---|---|---|---|---|
| Submitted: | 17 Mar 2007 7:19pm UTC | Modified: | 20 May 2007 11:22am UTC | ||
| From: | crisp at xs4all dot nl | Assigned to: | |||
| Status: | Open | Category: | Feature/Change Request | ||
| Version: | 5.2.1 | OS: | all | ||
| Votes: | 57 | Avg. Score: | 4.7 ± 0.6 | Reproduced: | 54 of 54 (100.0%) |
| Same Version: | 29 (53.7%) | Same OS: | 40 (74.1%) | ||
[17 Mar 2007 7:26pm UTC] tony2001@php.net
>that way there won't be a compatibility problem with previous >versions of PHP as we have now. Changing the limit doesn't mean removing the limit.
[17 Mar 2007 8:16pm UTC] crisp at xs4all dot nl
>Changing the limit doesn't mean removing the limit. But if you change the default limits to match the defaults limits set in PCRE internally you won't affect it's behavior compared to previous versions of PHP where the internal settings in PCRE were not overridden. Either that or don't override PCRE's internal settings unless these directives are explicitly set and enabled in php.ini (at the moment these directives are commented in the php.ini samples).
[19 May 2007 6:49am UTC] tigr at mail15 dot com
For me this new behaviour have broken my templates system. While some of
regexpes where simplified, others could not be done so. In some
situations increasing these numbers of little help. For instance(the
regexp was simplified greatly, in real-life application it is much more
complex):
<?php
echo preg_replace('/\$[a-z]+([a-z]*(?:\[[a-z]*\])?)*/i'
, 'replaced', '$abc $something[something]');
echo var_dump(preg_last_error());
?>
Expected result - 'replaced replaced replaced'. Actual result - nothing,
NULL returned, preg_last_error() shows that there is
PREG_BACKTRACK_LIMIT_ERROR error. Also increasing backtrack limit leads
to another error, PREG_RECURSION_LIMIT_ERROR. Increasing recursion limit
leads to php hanging up.
Changing first or second asterisk in pattern to plus sign immediately
fixes the problem, but I need it in this way. Also, do you think that
this is a correct behaviour? I thing there is a bug somewhere that way.
However, this works pretty well on php 4.x, 5.x and even at 5.2.1 (at
one of the hosts), but it does not work on my local php5.2.2 on
WinXPsp2.
[19 May 2007 8:32am UTC] tigr at mail15 dot com
Sorry, little mistake: expected result not 'replaced replaced replaced', but 'replaced replaced'.
[20 May 2007 10:49am UTC] nlopess@php.net
we simply can't increase recursion limit or we risk segfaulting php. increase the backtrack limit is also risky, but is much safer (although regexes with much backtracking are usually not well written). I'll think more about this..
[20 May 2007 11:09am UTC] tigr at mail15 dot com
It is kinda strange: previous versions work pretty nice, swiftly executing all patterns. And in some situations (as I mentioned before) increasing recursion and backtrack limits just won't help. I suppose it's wrong behaviour. Also, note that examples are pretty short and simple. Increasing both limits to 1 000 000 does not help - just why?
[20 May 2007 11:22am UTC] crisp at xs4all dot nl
PHP 5.2.0 includes an update of the PCRE library (version 6.7), so some problems may not be totally due to the restrictive limits of the PCRE settings in PHP but could be a bug/regression in PCRE itself. PCRE has always been very poor in internal optimisation of expressions that contain look-aheads or look-behinds, especially when they are combined with some OR'ed subexpression. It's backtracking mechanism is quite simplistic and doesn't rule out execution paths that are sure not to result in a match - in fact, it doesn't have any sort of execution planner.
[16 Aug 2007 3:58pm UTC] brandon at invisionpower dot com
Installations of 5.2 are causing this issue for us with relatively simple regex. Users can submit an arbitrary amount of data (I work on a bulletin board software) that is parsed out for bbcode tags. Even simple expressions are causing problems. $txt = preg_replace_callback( "#\[code\](.+?)\[/code\]#is", array( &$this, 'regex_code_tag' ), $txt ); var_dump( preg_last_error() ); The callback function is not being hit at all and the output is int(2) (backtrack limit hit). Increasing backtrack limit to 1,000,000 "resolves" the issue, but with shared hosting plans it's unrealistic to expect hosts to make php.ini changes to allow this. I agree with crisp - the limit in PHP should bet set to the internal PCRE options, with the php.ini settings allowing you to reduce them if you wish. PHP should not arbitrarily reduce them.
[16 Aug 2007 7:00pm UTC] drnick at physics dot byu dot edu
I just wanted to throw out that I completely agree with crisp. We recently updated PHP on our webserver to 5.2.3 and had issues with our template system on input sizes of a certain size (>100K). The idea of allowing PHP to enforce limits on the PCRE library is fine, but having the default action (when recursion_limit and backtrack_limit are commented-out) be PHP overriding PCRE's internal limits is VERY problematic. Aside from being incredibly anti backwards-compatible, I believe PHP should not make arbitrary assumptions such as these. I believe PHP should be changed so that the default action is to make use of PCRE's internal limits, and if people want to enforce their own, they can make that decision via the options. Perhaps modify php.ini-recommended to explain the options and say why having external limits can be good.
[10 Dec 2007 2:18pm UTC] daan at react dot nl
This issue is still a problem, plus this low setting is also the cause of segfaults. (see http://bugs.php.net/bug.php?id=43031) At the moment even this simple regexp segfaults 5.2.5: preg_match('/(.)*/', str_repeat('x', 6000)); I hope that is not intended behavior, as is suggested in the reply in bug report 43031.
[28 Aug 2009 8:54am UTC] tom at r dot je
This is still an issue in 5.3. The low limit causes scripts which hit it to fail without a warning, notice or error, creating hard to track down bugs For example, something which works fine for one set of data stops working on another set because the data is just longer. This cannot be the expected behaviour, surely? At minimum there needs to be a warning. Ideally though, the limit needs to be put to the pcre defaults.

Description: ------------ The new pcre.backtrack_limit configuration directive is by default too restrictive (100.000) which results in failure of many - often quite simple - regular expressions. I take it that this directive overrules the default setting for MATCH_LIMIT in PCRE which will also imply that the naming of this directive is wrong since MATCH_LIMIT is for every match() call in PCRE, not only those for backtracking. It would make more sense to change the default of both pcre.backtrack_limit and pcre.recursion_limit to the ones that PCRE itself also supplies (10.000.000 for both) - that way there won't be a compatibility problem with previous versions of PHP as we have now. Reproduce code: --------------- $a = 'baab' . str_repeat('a', 100024); $b = preg_replace('/b.*b/', '', $a); Expected result: ---------------- I would expect $b to contain 100024 times 'a' Actual result: -------------- $b is a nullpointer, preg_last_error() reports '2' which is PREG_BACKTRACK_LIMIT_ERROR