php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #48153 preg_replace() crashes in function "match"
Submitted: 2009-05-05 16:22 UTC Modified: 2009-05-06 23:37 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: raul dot gigea at directmedia dot de Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2CVS-2009-05-06 (snap) OS: FreeBSD 7.1-RELEASE-p4
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: raul dot gigea at directmedia dot de
New email:
PHP Version: OS:

 

 [2009-05-05 16:22 UTC] raul dot gigea at directmedia dot de
Description:
------------
Hi, i mananged to reproduce this bug on various clean FreeBSD systems 
with php version 5.2.9, than compiled todays cvs snapshot version and 
the bug is still reproducible. 

Memory limit was 128MB which should be more than enough for the code 
that produced the crash.

Stacksize was 64MB. ( ulimit -s )

5.2.6 works fine. Also, disabling mhash extension fixes the problem.

Reducing pcre.recursion_limit and pcre.backtrack_limit in php.ini also 
fixes the problem, but creates a new one, because the preg_replace 
function doesn't produce the expected results any more. ( They have to 
be reduced to about 50, so that the segfault doesn't occur ) 

Details follow.

Configure Line: ./configure --with-mhash
List of modules: 

List of Modules:
./sapi/cli/php -m            
[PHP Modules]
ctype
date
dom
filter
hash
iconv
json
libxml
mhash
pcre
PDO
pdo_sqlite
posix
Reflection
session
SimpleXML
SPL
SQLite
standard
tokenizer
xml
xmlreader
xmlwriter

[Zend Modules]


Reproduce code:
---------------
$contents = 'd' . str_repeat('a', 1900) . 'b';
$contents = preg_replace('/d(a)+b/', '', $contents);


Expected result:
----------------
$contents should be empty.

Actual result:
--------------
segmentation fault (core dumped)  ./sapi/cli/php ~/test.php

---------
Backtrace
---------

... starts with 1400 times the same call to match ... ( stack overflow 
)
#1400 0x0808e030 in match (eptr=0x28923db9 'a' <repeats 200 times>..., 
ecode=0x28a1666d "_", mstart=0x28923db8 "d", 'a' <repeats 199 
times>..., offset_top=2, md=0xbfbfcc30, ims=0, eptrb=0x0, flags=0, 
    rdepth=0) at /usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/pcrelib/pcre_exec.c:720
#1401 0x0809b118 in php_pcre_exec (argument_re=0x28a16640, 
extra_data=0xbfbfcd80, subject=0x28923db8 "d", 'a' <repeats 199 
times>..., length=1902, start_offset=0, options=0, offsets=0x2892253c, 
offsetcount=6)
    at /usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/pcrelib/pcre_exec.c:4895
#1402 0x080a10e6 in php_pcre_replace_impl (pce=0x28a15d40, 
subject=0x28923db8 "d", 'a' <repeats 199 times>..., subject_len=1902, 
replace_val=0x28921db4, is_callable_replace=0, result_len=0xbfbfcf04, 
    limit=-1, replace_count=0x0) at /usr/home/raul/php-
snapshot/php5.2-200905051430/ext/pcre/php_pcre.c:1045
#1403 0x080a0e86 in php_pcre_replace (regex=0x28923d80 "/d(a)+b/", 
regex_len=8, subject=0x28923db8 "d", 'a' <repeats 199 times>..., 
subject_len=1902, replace_val=0x28921db4, is_callable_replace=0, 
    result_len=0xbfbfcf04, limit=-1, replace_count=0x0) at 
/usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/php_pcre.c:955
---Type <return> to continue, or q <return> to quit---
#1404 0x080a1d66 in php_replace_in_subject (regex=0x28921e2c, 
replace=0x28921db4, subject=0x28917a60, result_len=0xbfbfcf04, limit=-
1, is_callable_replace=0 '\0', replace_count=0x0)
    at /usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/php_pcre.c:1272
#1405 0x080a2684 in preg_replace_impl (ht=3, return_value=0x28923378, 
return_value_ptr=0x0, this_ptr=0x0, return_value_used=1, 
is_callable_replace=0 '\0')
    at /usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/php_pcre.c:1370
#1406 0x080a2735 in zif_preg_replace (ht=3, return_value=0x28923378, 
return_value_ptr=0x0, this_ptr=0x0, return_value_used=1) at 
/usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/php_pcre.c:1386
#1407 0x083373d0 in zend_do_fcall_common_helper_SPEC 
(execute_data=0xbfbfd128) at zend_vm_execute.h:200
#1408 0x0833cff9 in ZEND_DO_FCALL_SPEC_CONST_HANDLER 
(execute_data=0xbfbfd128) at zend_vm_execute.h:1739
#1409 0x08336f22 in execute (op_array=0x28922580) at 
zend_vm_execute.h:92
#1410 0x08311592 in zend_execute_scripts (type=8, retval=0x0, 
file_count=3) at /usr/home/raul/php-snapshot/php5.2-
200905051430/Zend/zend.c:1134
#1411 0x082bd926 in php_execute_script (primary_file=0xbfbfe7d4) at 
/usr/home/raul/php-snapshot/php5.2-200905051430/main/main.c:2025
#1412 0x08390d62 in main (argc=2, argv=0xbfbfe8bc) at 
/usr/home/raul/php-snapshot/php5.2-
200905051430/sapi/cli/php_cli.c:1162

-----------------------------------------------------------------
The match function recursively calls itself until stack overflow:
-----------------------------------------------------------------

#1400 0x0808e030 in match (eptr=0x28923db9 'a' <repeats 200 times>..., 
ecode=0x28a1666d "_", mstart=0x28923db8 "d", 'a' <repeats 199 
times>..., offset_top=2, md=0xbfbfcc30, ims=0, eptrb=0x0, flags=0, 
    rdepth=0) at /usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/pcrelib/pcre_exec.c:720
720             RMATCH(eptr, ecode + _pcre_OP_lengths[*ecode], 
offset_top, md,
(gdb) frame 1399
#1399 0x0808f793 in match (eptr=0x28923dba 'a' <repeats 200 times>..., 
ecode=0x28a16674 "V", mstart=0x28923db8 "d", 'a' <repeats 199 
times>..., offset_top=4, md=0xbfbfcc30, ims=0, eptrb=0x0, flags=0, 
    rdepth=1) at /usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/pcrelib/pcre_exec.c:1361
1361          RMATCH(eptr, prev, offset_top, md, ims, eptrb, flags, 
RM13);
(gdb) frame 1398
#1398 0x0808e030 in match (eptr=0x28923dba 'a' <repeats 200 times>..., 
ecode=0x28a1666d "_", mstart=0x28923db8 "d", 'a' <repeats 199 
times>..., offset_top=4, md=0xbfbfcc30, ims=0, eptrb=0x0, flags=0, 
    rdepth=2) at /usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/pcrelib/pcre_exec.c:720
720             RMATCH(eptr, ecode + _pcre_OP_lengths[*ecode], 
offset_top, md,
----------------------------------------------------------------------

More info on php_pcre_exec:

(gdb) frame 1401
#1401 0x0809b118 in php_pcre_exec (argument_re=0x28a16640, 
extra_data=0xbfbfcd80, subject=0x28923db8 "d", 'a' <repeats 199 
times>..., length=1902, start_offset=0, options=0, offsets=0x2892253c, 
offsetcount=6)
    at /usr/home/raul/php-snapshot/php5.2-
200905051430/ext/pcre/pcrelib/pcre_exec.c:4895
4895      rc = match(start_match, md->start_code, start_match, 2, md, 
ims, NULL, 0, 0);
(gdb) p start_match
$1 = (const unsigned char *) 0x28923db8 "d", 'a' <repeats 199 
times>...
(gdb) p md->start_code
$2 = (const uschar *) 0x28a16668 "^"
(gdb) p md
$3 = (match_data *) 0xbfbfcc30
(gdb) p *md
$4 = {match_call_count = 1401, match_limit = 100000, 
match_limit_recursion = 100000, offset_vector = 0x2892253c, offset_end 
= 6, offset_max = 4, nltype = 0, nllen = 1, nl = "\n\000\000", lcc = 
0x8392880 "", 
  ctypes = 0x8392bc0 "\200", offset_overflow = 0, notbol = 0, noteol = 
0, utf8 = 0, jscript_compat = 0, endonly = 0, notempty = 0, partial = 
0, hitend = 0, bsr_anycrlf = 0, start_code = 0x28a16668 "^", 
  start_subject = 0x28923db8 "d", 'a' <repeats 199 times>..., 
end_subject = 0x28924526 "", start_match_ptr = 0x28923db8 "d", 'a' 
<repeats 199 times>..., end_match_ptr = 0xf0c <Address 0xf0c out of 
bounds>, 
  end_offset_top = 199340, capture_last = 1, start_offset = 0, 
eptrchain = 0x0, eptrn = 0, recursive = 0x0, callout_data = 0x0}
(gdb) p ims
$5 = 0
.

Hope this helps. Mail me for the full core file.




Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-05-05 18:36 UTC] fa@php.net
verified with FreeBSD 7.1-RELEASE-p4 and current snapshot: "Segmentation fault: 11 (core dumped)"
Although I didn't examine the coredump in detail


HEAD works fine
 [2009-05-06 18:37 UTC] jani@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows:

  http://windows.php.net/snapshots/

I can not reproduce this with linux and latest PHP_5_2 snapshot using 
this configure line:

'/usr/src/php-5.2CVS/configure' \
'--disable-all' \
'--enable-debug' \
'--disable-reflection' \
'--disable-cgi' \
'--with-curl' \
'--with-curlwrappers' \
'--with-pcre-regex' \
'--with-mhash'

 [2009-05-06 20:44 UTC] raul dot gigea at directmedia dot de
Ok, i tried using the CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
Which by the way was only 1 day newer than the one I tried.

I used your configure line:

'configure' \
'--disable-all' \
'--enable-debug' \
'--disable-reflection' \
'--disable-cgi' \
'--with-curl' \
'--with-curlwrappers' \
'--with-pcre-regex' \
'--with-mhash'

And its still segfaulting
 [2009-05-06 20:53 UTC] jani@php.net
You need to increase the stack size. I tried with 'ulimit -s 1024' and 
that does not crash..
 [2009-05-06 21:11 UTC] raul dot gigea at directmedia dot de
I already wrote that my stacksize was 64 MB. ulimit -s 1024 would 
reduce it to only 1 MB. 64MB is the upper limit without recompiling 
the kernel. And i don't think that this specific regex should eat up 
64 MB of stack size ...

Anyway, here's the result:

% ulimit -s 1024           
% ulimit -s
1024
% ./sapi/cli/php ~/test.php
zsh: segmentation fault (core dumped)  ./sapi/cli/php ~/test.php
% ulimit -s 65535
% ulimit -s
65535
% ./sapi/cli/php ~/test.php
zsh: segmentation fault (core dumped)  ./sapi/cli/php ~/test.php
% ulimit -s 65537
ulimit: value exceeds hard limit
 [2009-05-06 21:20 UTC] jani@php.net
You either have to increase the stack size or tune pcre.recursion_limit 
and pcre.backtrack_limit properly.
 [2009-05-06 22:09 UTC] raul dot gigea at directmedia dot de
I already described this in my original bug report, but I'll try to be 
more explicit:
As I described before, if I tune pcre.recursion_limit and 
pcre.backtrack_limit, than I get no segfault, but neither do I get the 
correct output. As an example, this code:

  $contents = 'sud' . str_repeat('a', 1900) . 'bccess';
  $contents = preg_replace('/d(a)+b/', '', $contents);
  echo $contents

Segfaults if recursion_limit/backtrack_limit is too high. Prints 
"success" if everything went well, and prints nothing if 
recursion_limit is too low.

I can only get it to print nothing tuning those two parameters. ( over backtrack_limit 2458 it crashes, below 2457 it doesn't print anything 
). So it's a workaround the segfault, but you get another problem - 
you don't get the wanted result.

I suspect the problem could be in the mhash library, and the way php 
uses it, because it prints 'success' if I disable the mhash extension.
 [2009-05-06 22:18 UTC] raul dot gigea at directmedia dot de
By the way, the mhash lib version is 0.9.9.
 [2009-05-06 22:30 UTC] scottmac@php.net
The mhash library is gone in 5.3 and replaced with a wrapper around the hash library.

Can you try a 5.3 snapshot and see if you get the issue?

I should say I can't reproduce this on 5.2 on OSX with the same configure like Jani used.
 [2009-05-06 22:54 UTC] raul dot gigea at directmedia dot de
Just tried it: with the 5.3 snapshot it works.

Compiling 5.2 on OSX right now
 [2009-05-06 23:23 UTC] raul dot gigea at directmedia dot de
Ok, tried it on osx. It doesn't crash with 1900 'a''s, but it crashes 
with 2900. 

Try this code on OSX with 5.2CVS, it crashes on my macbook. With latest 
libmhash from macports as of today ( 0.9.9.9_0 ) :

$contents = 'sud' . str_repeat('a', 2900) . 'bccess';
$contents = preg_replace('/d(a)+b/', '', $contents);
echo $contents;
 [2009-05-06 23:37 UTC] raul dot gigea at directmedia dot de
by the way, with more than 30000 a's it segfaults with 5.3 too.

$contents = 'sud' . str_repeat('a', 30000) . 'bccess';
$contents = preg_replace('/d(a)+b/', '', $contents);
echo $contents
 [2010-06-18 20:38 UTC] bit2 at freemail dot hu
I experienced this bug with Debian 5.0.4 (2.6.26-2-686 #1 SMP i686), PHP 5.2.6-1+lenny8, PCRE 7.6 2008-01-28, using mod_fcgid (libapache2-mod-fcgid 2.2-1) and the default stacksize of 8k.

The sample code of Raul segfaults for me only with an input of ~3300 characters. I've simplified the code a bit further. The following causes a segfault for me every time I run it:

$contents = str_repeat('a', 3396);
$contents = preg_replace('/(.)*/', '', $contents);

Playing with stack size or pcre limits (recursion and/or backtrack) works around the problem, just as Raul described.

I could live with this "limitation" (or bug ... whatever) if PHP didn't just segfault, but threw an error describing what happened. Getting a segfault doesn't help too much and I've spent a few hours til I got to the root of the problem. :-(

PS: if I understand it right, the segfault happens because the 8K stack gets full. And this is probably because PCRE stores every match of a parenthesized sub-pattern in a new string. In our example every character of the input string is a match for that parenthesized pattern ... thus for every character a new string is created in the stack (each taking up 2 bytes). And the 8K limit is reached, because there were a few other things in the stack already when the PCRE function started.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Nov 22 01:01:30 2024 UTC