php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45735 preg_match fails with Segmentation Fault on capturing subpattern
Submitted: 2008-08-06 15:22 UTC Modified: 2013-04-13 00:23 UTC
From: johnston dot joshua at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2CVS, 5.3CVS, 6CVS (2008-08-06) OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: johnston dot joshua at gmail dot com
New email:
PHP Version: OS:

 

 [2008-08-06 15:22 UTC] johnston dot joshua at gmail dot com
Description:
------------
When running a preg_match with a capturing subpattern against large input, php crashes with a Segmentation Fault
I tested this on OS X and OpenSuSE, same result
OS X:~$ php -v
PHP 5.2.6 (cli) (built: Jul 15 2008 12:18:21) 

OpenSuSE:~# php -v
PHP 4.4.7 (cgi) (built: May 12 2008 10:19:51)

Reproduce code:
---------------
<?php
preg_match("/http:\/\/(.)+\.ru/i", str_repeat("http://google.ru", 2000));
?>

Expected result:
----------------
PHP should handle the error or something other than letting PCRE crash php

Actual result:
--------------
jjohnston:~$ php -r 'preg_match("/http:\/\/(.)+\.ru/i", str_repeat("http://google.ru", 2000));'
Segmentation fault

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-08-06 16:36 UTC] jani@php.net
==18470== Stack overflow in thread 1: can't grow stack to 0x7FE601FA8
==18470==
==18470== Process terminating with default action of signal 11 (SIGSEGV)
==18470==  Access not within mapped region at address 0x7FE601FA8
==18470==    at 0x4358C0: match (pcre_exec.c:403)
==18470== Stack overflow in thread 1: can't grow stack to 0x7FE601EC8

 [2008-08-06 16:38 UTC] jani@php.net
Setting pcre.backtrack_limit = 10000 prevents the crash for me.

 [2008-08-06 17:29 UTC] johnston dot joshua at gmail dot com
In my case the parens were unnecessary so I removed them which stopped the Seg Fault.

Is there a way to handle this error from inside php?

If I set pcre.backtrack_limit to any other value will it always seg fault if it is too low?
 [2008-08-06 17:40 UTC] felipe@php.net
That is known PCRE behavior.
See: http://manpages.courier-mta.org/htmlman3/pcrestack.3.html
 [2008-08-06 17:47 UTC] johnston dot joshua at gmail dot com
I understand it is a known pcre behavior but does that mean it is ok to let pcre trigger a segmentation fault? Is there no possible way to handle this failure gracefully?
 [2008-08-07 08:28 UTC] derick@php.net
AFAIK only be using a non-stack based evaluator (pcre_dfa_exec)... which is not compatible with the perl regular expressions, see the description at http://manpages.courier-mta.org/htmlman3/pcre_dfa_exec.3.html
 [2008-08-07 11:27 UTC] lbarnaud@php.net
I seen that PCRE has a NO_RECURSE flag to avoid the recursion in
match() (described in pcre_exec.c).

Defining NO_RECURSE at compile time avoids this problem :)

diff -u -r1.38.2.3.2.10.2.3 config0.m4
--- ext/pcre/config0.m4 2 Jun 2008 14:12:20 -0000      
1.38.2.3.2.10.2.3
+++ ext/pcre/config0.m4 7 Aug 2008 11:25:39 -0000
@@ -59,7 +59,7 @@
                                 pcrelib/pcre_ord2utf8.c
pcrelib/pcre_refcount.c pcrelib/pcre_study.c \
                                 pcrelib/pcre_tables.c
pcrelib/pcre_try_flipped.c pcrelib/pcre_valid_utf8.c \
                                 pcrelib/pcre_version.c
pcrelib/pcre_xclass.c"
-    PHP_NEW_EXTENSION(pcre, $pcrelib_sources php_pcre.c, no,,-
+    PHP_NEW_EXTENSION(pcre, $pcrelib_sources php_pcre.c, no,,-
     PHP_ADD_BUILD_DIR($ext_builddir/pcrelib)
     PHP_INSTALL_HEADERS([ext/pcre], [php_pcre.h pcrelib/])
     AC_DEFINE(HAVE_BUNDLED_PCRE, 1, [ ])

 [2008-08-07 13:44 UTC] johnston dot joshua at gmail dot com
Just for my own personal knowledge, is it the capturing pattern the part that causes the recursion?

If I remove the () part of the pattern I can keep adding 0's to the below code I can trigger a php memory error but no seg fault.

jjohnston:~$ php -r 'preg_match("/http:\/\/.+\.ru/i", str_repeat("http://google.ru", 2000000));
 [2011-12-15 14:42 UTC] josiecki at silvercube dot pl
Why is this bug status set to "Bogus"? Seems to be unsolved since over three years, and I haven't found a single PHP installation that would not be vulnerable to this, including quite new 5.3.8 installation...
 [2011-12-23 13:09 UTC] vojtech dot kurka at gmail dot com
We have hit this bug today on 5.3.8, just sending 70KB variable as an input to preg_match_all().

Please fix this, it took us quite a lot of time to find the root cause. This should never cause a segfault. If the preg_match_all() function would throw a standard PHP error/warning, we would find the problem immeadiately. Debugging such problems is just a waste of time, this bugreport is more than 3 years old.
 [2013-04-11 16:35 UTC] michael at writhem dot com
why is this status not a bug?
 [2013-04-12 18:40 UTC] josh dot adell at gmail dot com
Just hit this bug today as well. It would be nice if PHP handled this more 
gracefully by triggering an error or warning instead of segfaulting.
 [2013-04-12 18:51 UTC] johnston dot joshua at gmail dot com
I agree that something needs to be done about this bug on the php side that 
doesn't involve custom compiling pcre.

In my case way back in '08, I found that the solution was misuse of capturing 
subpatterns. I was using () for grouping even though I did not need to match 
subpatterns. In reality I should have been using (:?) which solved my issue.
 [2013-04-13 00:23 UTC] rasmus@php.net
The problem here is that there is no way to detect run-away regular expressions 
here without huge performance and memory penalties. Yes, we could build PCRE in a 
way that it wouldn't segfault and we could crank up the default backtrack limit 
to something huge, but it would slow every regex call down by a lot. If PCRE 
provided a way to handle this in a more graceful manner without the performance 
hit we would of course use it.
 [2014-03-06 15:55 UTC] php at mandark dot fr
If it makes sense, you may use ungreedy quantifiers, they will inverse the order of backtracking, fixing your problem in the only case a short match is found (else, it will backtrack to the whole document, hitting your stack space another time).

You may also take a look at possessive quantifiers and atomic grouping, they are ways to prevent backtracking, make sure to understand them before using them, but they are safer that just using ungreedy quantifiers to reduce backtracking volume.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Oct 16 04:01:27 2024 UTC