|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2008-07-17 16:31 UTC] kaiser at macbureau dot de
Description:
------------
PCRE with utf8 (Typo3 Mailform) kills apache childprocess. With the
following entry in apache errorlog on FreeBSD 7 with Apache 2.2.8:
[notice] child pid 6709 exit signal Illegal instruction (4)
Output of ulimit -a:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) 33554432
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 11095
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 524288
cpu time (seconds, -t) unlimited
max user processes (-u) 5547
virtual memory (kbytes, -v) unlimite
Reproduce code:
---------------
#!/usr/local/bin/php
<?php
function is_utf8($str) {
return (preg_match('/^([\x00-\x7f]|[\xc2-\xdf][\x80-\xbf]|\xe0[\
}
$i=0;
$str = '';
while ($i<5000) {
$str .= 'a';
$i++;
}
is_utf8($str);
?>
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Fri Oct 24 04:00:01 2025 UTC |
Sorry, c&p error, thanks, looking forward to hear from you. ./test.php Segmentation fault (core dumped) #!/usr/local/bin/php <?php function is_utf8($str) { return (preg_match('/^([\x00-\x7f]|[\xc2-\xdf][\x80- \xbf]|\xe0[\xa0-\xbf][\x80-\xbf]|[\xe1-\xec][\x80-\xbf]{2}|\xed[\x80- \x9f][\x80- \xbf]|[\xee-\xef][\x80-\xbf]{2}|f0[\x90-\xbf][\x80-\xbf]{2}|[\xf1- \xf3][\x80- \xbf]{3}|\xf4[\x80-\x8f][\x80-\xbf]{2})*$/', $str) === 1); } $i=0; $str = ''; while ($i<5000) { $str .= 'a'; $i++; } is_utf8($str); ?>I reproduced this on FreeBSD 7.0 + Apache/2.2.9 + PHP/5.2.6 (bundled prce) script: <?php $str = str_repeat('a', 10000); $utf8 = (preg_match("/^([\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF][\x80-\xBF]{2}|[\xF1-\xF3][\x80-\xBF]{3}|\xF4[\x80-\x8F][\x80-\xBF]{2})*$/", $str)) ? "yes" : "no"; echo $utf8; ?> mod_php: in apache logs: [notice] child pid 54586 exit signal Illegal instruction (4) in cli works fine!I've built PHP 5.2.8 with debugging enabled, and ran the following script under PHP via the CLI, under gdb: <?php $str = str_repeat('a', 1244); $utf8 = (preg_match("/^([\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}| \xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF][\x80-\xBF]{2}|[\xF1-\xF3][\x80-\xBF]{3}|\xF4[\x80-\x8F][\x80-\xBF]{2})*$/", $str)) ? "yes " : "no"; echo $utf8; ?> It's important to note that if I change the str_repeat() length from 1244 to 1243, the segfault doesn't happen. The system limits: Resource limits (current): cputime infinity secs filesize infinity kB datasize 786432 kB stacksize 131072 kB coredumpsize infinity kB memoryuse infinity kB memorylocked infinity kB maxprocesses 5547 openfiles 11095 sbsize infinity bytes vmemoryuse infinity kB Anyway, the results of the gdb backtrace are here (~790KB file): http://www.malkavian.com/~jdc/php.bug45546.backtrace.txt Hope this helps.Two gdb examples: gdb66: Program received signal SIGSEGV, Segmentation fault. match ( eptr=0x29385a68 "3'\";\n$select[] = \"SELECT p1.id, nick, p1.creation_date, p1.modification_date, p1.post_title, p1.post_text, p1.parent_post_id, p2.post_title AS parent_post_title, p3.post_title AS answer_parent_post_ti"..., ecode=0x28f160ed "\034\"T", mstart=0x293854bc "<?php\n$select = array();\n$select[] = \"SELECT uni_files.id, name, disk_filename, icon, size FROM uni_files INNER JOIN uni_filetypes ON uni_files.filetype_id=uni_filetypes.id WHERE post_id='167' AND blo"..., offset_top=4, md=0xbfbef000, ims=6, eptrb=0x0, flags=0, rdepth=1362) at /usr/ports/lang/php5/work/php-5.2.8/ext/pcre/pcrelib/pcre_exec.c:580 580 prop_value = 0; and 0x2863b28a in match ( eptr=0x2940b64f "?аМ202М214, даже М201М200еднемМ203 клаМ201М201М203>, ?00\223 заМ217вил ?232М203ниМ206М213н.? даже М201М200еднемМ203 клаМ201М201М203>, ?00\223 заМ217вил ?232М203ниМ206М213н.? </p><p><?222М213 знаеМ202е, М207М202о ?..., ecode=0x28ef03bb "\034'U", mstart=0x2940b398 "'<p>?237о мнениМ216 ?232М203ниМ206М213на, кМ200М213мМ201кие влаМ201М202и должнМ213 даМ202М214 возможноМ201М202М214 М201М200еднемМ203 клаМ201М201М203 капиМ202ализиМ200оваМ202М214 иМ205 М201беМ200ежен?..., offset_top=4, md=0xbfbf89d0, ims=0, eptrb=0xbfa006a0, flags=2, rdepth=1388) at /usr/ports/lang/php5/work/php-5.2.8/ext/pcre/pcrelib/pcre_exec.c:2160 2160 /usr/ports/lang/php5/work/php-5.2.8/ext/pcre/pcrelib/pcre_exec.c: No such file or directory. in /usr/ports/lang/php5/work/php-5.2.8/ext/pcre/pcrelib/pcre_exec.cThere isn't a whole lot we can do about this. PCRE internally calls match() recursively in some circumstances. We have the pcre.recursion_limit setting to prevent pcre from eating all available stack for really nasty regular expressions, but depending on the expression and how you set that limit you can still write a regex that will cause pcre to eat your entire stack. For example, taking one of the reproducing scripts in this report and modifying it slightly to pass in the number of a's in $str: <?php $str = str_repeat('a', $argv[1]); $utf8 = (preg_match("/^([\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80- \xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}| \xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF][\x80-\xBF]{2}|[\xF1-\xF3][\x80-\xBF] {3}|\xF4[\x80-\x8F][\x80-\xBF]{2})*$/", $str)) ? "yes " : "no"; echo $utf8."\n"; if (preg_last_error() == PREG_RECURSION_LIMIT_ERROR) { print 'Recursion limit was exhausted!'; } And then running it with various lengths and recursion limits: % php -d pcre.recursion_limit=1000 g 100 yes So, with a recursion limit of 1000 and just 100 a's we are fine. But, increase it to 1000 a's and we get: % php -d pcre.recursion_limit=1000 g 1000 no Backtrack limit was exhausted! At a certain level you will be able to smash your entire stack if you set your limit high enough: % php -d pcre.recursion_limit=100000 g 10000 Segmentation fault (core dumped) There are some really really slow checks we can do to detect this, but we would have to run these for every regex and we don't feel the performance hit is worth it. There are usually ways to rewrite the regex that doesn't need to recurse indefinitely like this. If someone has a decent way to fix this that doesn't slow down every match by a lot, please send us a patch, but until then I would suggest fixing your regexes. This is either a "Won't fix" or "Not a bug" although neither really describe the situation. It is more like a "Can't fix in a sane way" situation.