go to bug id or search bugs for
When running a preg_match with a capturing subpattern against large input, php crashes with a Segmentation Fault
I tested this on OS X and OpenSuSE, same result
OS X:~$ php -v
PHP 5.2.6 (cli) (built: Jul 15 2008 12:18:21)
OpenSuSE:~# php -v
PHP 4.4.7 (cgi) (built: May 12 2008 10:19:51)
preg_match("/http:\/\/(.)+\.ru/i", str_repeat("http://google.ru", 2000));
PHP should handle the error or something other than letting PCRE crash php
jjohnston:~$ php -r 'preg_match("/http:\/\/(.)+\.ru/i", str_repeat("http://google.ru", 2000));'
Add a Patch
Add a Pull Request
==18470== Stack overflow in thread 1: can't grow stack to 0x7FE601FA8
==18470== Process terminating with default action of signal 11 (SIGSEGV)
==18470== Access not within mapped region at address 0x7FE601FA8
==18470== at 0x4358C0: match (pcre_exec.c:403)
==18470== Stack overflow in thread 1: can't grow stack to 0x7FE601EC8
Setting pcre.backtrack_limit = 10000 prevents the crash for me.
In my case the parens were unnecessary so I removed them which stopped the Seg Fault.
Is there a way to handle this error from inside php?
If I set pcre.backtrack_limit to any other value will it always seg fault if it is too low?
That is known PCRE behavior.
I understand it is a known pcre behavior but does that mean it is ok to let pcre trigger a segmentation fault? Is there no possible way to handle this failure gracefully?
AFAIK only be using a non-stack based evaluator (pcre_dfa_exec)... which is not compatible with the perl regular expressions, see the description at http://manpages.courier-mta.org/htmlman3/pcre_dfa_exec.3.html
I seen that PCRE has a NO_RECURSE flag to avoid the recursion in
match() (described in pcre_exec.c).
Defining NO_RECURSE at compile time avoids this problem :)
diff -u -r184.108.40.206.220.127.116.11 config0.m4
--- ext/pcre/config0.m4 2 Jun 2008 14:12:20 -0000
+++ ext/pcre/config0.m4 7 Aug 2008 11:25:39 -0000
@@ -59,7 +59,7 @@
pcrelib/pcre_refcount.c pcrelib/pcre_study.c \
pcrelib/pcre_try_flipped.c pcrelib/pcre_valid_utf8.c \
- PHP_NEW_EXTENSION(pcre, $pcrelib_sources php_pcre.c, no,,-
+ PHP_NEW_EXTENSION(pcre, $pcrelib_sources php_pcre.c, no,,-
PHP_INSTALL_HEADERS([ext/pcre], [php_pcre.h pcrelib/])
AC_DEFINE(HAVE_BUNDLED_PCRE, 1, [ ])
Just for my own personal knowledge, is it the capturing pattern the part that causes the recursion?
If I remove the () part of the pattern I can keep adding 0's to the below code I can trigger a php memory error but no seg fault.
jjohnston:~$ php -r 'preg_match("/http:\/\/.+\.ru/i", str_repeat("http://google.ru", 2000000));
Why is this bug status set to "Bogus"? Seems to be unsolved since over three years, and I haven't found a single PHP installation that would not be vulnerable to this, including quite new 5.3.8 installation...
We have hit this bug today on 5.3.8, just sending 70KB variable as an input to preg_match_all().
Please fix this, it took us quite a lot of time to find the root cause. This should never cause a segfault. If the preg_match_all() function would throw a standard PHP error/warning, we would find the problem immeadiately. Debugging such problems is just a waste of time, this bugreport is more than 3 years old.
why is this status not a bug?
Just hit this bug today as well. It would be nice if PHP handled this more
gracefully by triggering an error or warning instead of segfaulting.
I agree that something needs to be done about this bug on the php side that
doesn't involve custom compiling pcre.
In my case way back in '08, I found that the solution was misuse of capturing
subpatterns. I was using () for grouping even though I did not need to match
subpatterns. In reality I should have been using (:?) which solved my issue.
The problem here is that there is no way to detect run-away regular expressions
here without huge performance and memory penalties. Yes, we could build PCRE in a
way that it wouldn't segfault and we could crank up the default backtrack limit
to something huge, but it would slow every regex call down by a lot. If PCRE
provided a way to handle this in a more graceful manner without the performance
hit we would of course use it.
If it makes sense, you may use ungreedy quantifiers, they will inverse the order of backtracking, fixing your problem in the only case a short match is found (else, it will backtrack to the whole document, hitting your stack space another time).
You may also take a look at possessive quantifiers and atomic grouping, they are ways to prevent backtracking, make sure to understand them before using them, but they are safer that just using ungreedy quantifiers to reduce backtracking volume.