php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #36983 preg_match_all incorrect due to backtrack length limit?
Submitted: 2006-04-05 12:41 UTC Modified: 2006-05-26 18:36 UTC
From: crisp at tweakers dot net Assigned: andrei (profile)
Status: Closed Package: PCRE related
PHP Version: 5.1.2, 4.4.2 OS: *
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
48 + 39 = ?
Subscribe to this entry?

 
 [2006-04-05 12:41 UTC] crisp at tweakers dot net
Description:
------------
It seems that when PCRE needs to backtrack more than 20 characters in order to evaluate an OR'ed expression the results of a preg_match_all are incorrect/incomplete.
See below code as an example.

Reproduce code:
---------------
$foo = 'foo# bar# abcdefghijklmnopqrst bla#';
echo '$foo = ', $foo, "\n";

preg_match_all('/([a-y]|z)+#/', $foo, $matches1);
preg_match_all('/([a-y]+|z)+#/', $foo, $matches2);

print_r($matches1[0]);
print_r($matches2[0]);


$foo = 'foo# bar# abcdefghijklmnopqrstu bla#';
echo '$foo = ', $foo, "\n";

preg_match_all('/([a-y]|z)+#/', $foo, $matches1);
preg_match_all('/([a-y]+|z)+#/', $foo, $matches2);

print_r($matches1[0]);
print_r($matches2[0]);


Expected result:
----------------
both expressions for both strings should give the following output:

Array
(
    [0] => foo#
    [1] => bar#
    [2] => bla#
)

Actual result:
--------------
The last expression gives the following output on the last string:

Array
(
    [0] => foo#
    [1] => bar#
)

The last match on bla# is missing in the result for the preg_match_all

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-04-05 17:25 UTC] crisp at tweakers dot net
Indeed this seems to be an issue with PCRE; version 6.6 still has this problem so I'll try my luck with the PCRE team.
Notwithstanding I don't believe 'bogus' is the right status for this bug since it is an actual issue, just not with PHP (although afaik PHP still ships with PCRE version 6.4).
I'll just put this bug to 'closed'
 [2006-04-05 23:01 UTC] crisp at tweakers dot net
Just a small update on this. PCRE is actually issuing an error; specifically PCRE_ERROR_MATCHLIMIT (-8)
Apparently when PCRE goes into backtracking it will try any combination of the OR'ed subpattern resulting into over 10.000.000 calls to match(). Obviously there is no construction that first checks if (and where) a possible alternative defined as an OR can match within the part that is being backtracked.

So this seems to be defined behaviour of PCRE (but it could use improvement if even a simple case like the one I constructed already triggers this behaviour), but my question now is why does PHP not raise a warning or error when PCRE exits with one?
 [2006-05-26 18:36 UTC] nlopess@php.net
PHP 5.2 already has a preg_last_error() function. It was choosen to not emit a warning, but rather making them acessible though that function.

If you append the line below to your scipt, it will print true:
var_dump(preg_last_error() === PREG_BACKTRACK_LIMIT_ERROR);
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 04:01:28 2024 UTC