php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #69864 Segfault in preg_replace_callback
Submitted: 2015-06-17 16:42 UTC Modified: 2015-06-23 11:04 UTC
From: james dot h dot cracknell at gmail dot com Assigned: cmb (profile)
Status: Closed Package: PCRE related
PHP Version: 7.0.0alpha1 OS: Windows Server 2008 R2
Private report: No CVE-ID: None
 [2015-06-17 16:42 UTC] james dot h dot cracknell at gmail dot com
Description:
------------
Note the following in the provided stack trace:

struct real_pcre * argument_re = 0x00000001

I believe what is occurring here is if you use `preg_replace_callback` with a callback making (very) extensive use of PCRE functions, the outer regular expression gets evicted from the PCRE cache and freed before `preg_replace_callback` completes.

Looking at `pcre_get_compiled_regex_cache`, it looks as though no checks are performed to prevent this from happening:
https://github.com/php/php-src/blob/php-7.0.0alpha1/ext/pcre/php_pcre.c#L446

Switching to an approach using `preg_match_all` with manual replacement obviously sidesteps the issue.

Stack trace follows, less the subject string value:

php7!php_pcre_exec(struct real_pcre * argument_re = 0x00000001, struct pcre_extra * extra_data = 0x0d6074b8, char * subject = 0x0598fa10 "(snip)", int length = 0n1235, int start_offset = 0n997, int options = 0n8192, int * offsets = 0x052ac3f0, int offsetcount = 0n21)+0x149 [c:\php-sdk\php70dev\vc14\x86\php-7.0.0alpha1\ext\pcre\pcrelib\pcre_exec.c @ 6420]
php7!php_pcre_replace_impl(struct pcre_cache_entry * pce = 0x00ee58d8, struct _zend_string * subject_str = 0x0598fa00, char * subject = 0x0598fa10 "(snip)", int subject_len = 0n1235, struct _zval_struct * replace_val = 0x008143d0, int is_callable_replace = 0n1, int limit = 0n-1, int * replace_count = 0x052ac534)+0x148 [c:\php-sdk\php70dev\vc14\x86\php-7.0.0alpha1\ext\pcre\php_pcre.c @ 1120]
php7!php_pcre_replace+0x33 [c:\php-sdk\php70dev\vc14\x86\php-7.0.0alpha1\ext\pcre\php_pcre.c @ 1026]
php7!php_replace_in_subject(struct _zval_struct * regex = 0x008143c0, struct _zval_struct * replace = 0x008143d0, struct _zval_struct * subject = 0x008143e0, int limit = 0n-1, int is_callable_replace = 0n1, int * replace_count = 0x052ac534)+0x258 [c:\php-sdk\php70dev\vc14\x86\php-7.0.0alpha1\ext\pcre\php_pcre.c @ 1353]
php7!preg_replace_impl(struct _zval_struct * return_value = 0x00814370, struct _zval_struct * regex = 0x008143c0, struct _zval_struct * replace = 0x008143d0, struct _zval_struct * subject = 0x008143e0, int limit_val = 0n-1, int is_callable_replace = 0n1, int is_filter = 0n0)+0x264 [c:\php-sdk\php70dev\vc14\x86\php-7.0.0alpha1\ext\pcre\php_pcre.c @ 1411]
php7!zif_preg_replace_callback(struct _zend_execute_data * execute_data = <Value unavailable error>, struct _zval_struct * return_value = 0x00814370)+0x173 [c:\php-sdk\php70dev\vc14\x86\php-7.0.0alpha1\ext\pcre\php_pcre.c @ 1494]
php7!ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER+0x2e6e14 [c:\php-sdk\php70dev\vc14\x86\php-7.0.0alpha1\zend\zend_vm_execute.h @ 701]


Patches

pcre-refcount (last revision 2015-06-18 12:49 UTC by cmb@php.net)
quick-hack (last revision 2015-06-17 22:03 UTC by cmb@php.net)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-06-17 17:53 UTC] cmb@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: cmb
 [2015-06-17 17:53 UTC] cmb@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.


 [2015-06-17 17:53 UTC] rasmus@php.net
-Status: Feedback +Status: Open -Assigned To: cmb +Assigned To:
 [2015-06-17 17:53 UTC] rasmus@php.net
Do you have a reproducing example? PCRE_CACHE_SIZE is 4096 elements. That sounds like more than "very extensive use" that sounds like "insane use".
 [2015-06-17 18:19 UTC] james dot h dot cracknell at gmail dot com
This will crash on Windows and Linux:

preg_replace_callback('/a/', function($m) {
  for($i = 0; $i < 10000; $i++)
    preg_replace('/foo'.$i.'bar/', 'baz', '???foo'.$i.'bar???');
  return 'b';
}, 'aa');
 [2015-06-17 18:34 UTC] cmb@php.net
Indeed, it does crash under 7.0.0alpha1 on Linux:
<http://3v4l.org/RY30H>. I can't reproduce the problem on Windows,
though (neither x86 nor x64, neither NTS nor TS).

Anyhow, the problem seems to have been fixed (see php7@20150601
result).
 [2015-06-17 18:54 UTC] james dot h dot cracknell at gmail dot com
At minimum Win x86 NTS still crashes at the most recent available snap (r7db113f@2015-06-17).
 [2015-06-17 18:58 UTC] james dot h dot cracknell at gmail dot com
Ah look, try again with echoed output:
http://3v4l.org/1cRNo
 [2015-06-17 22:03 UTC] cmb@php.net
The following patch has been added/updated:

Patch Name: quick-hack
Revision:   1434578591
URL:        https://bugs.php.net/patch-display.php?bug=69864&patch=quick-hack&revision=1434578591
 [2015-06-17 22:03 UTC] cmb@php.net
I'm still haven't been able to reproduce a segfault, but indeed
the results of the script are wrong. And yes, it is as you
suspected, James. The pcre_cache_entry is deleted even though it's
used later on.

Interestingly, pcre_cache_entry has a member refcount which
doesn't seem to be used (at least in master). This member might be
used to prevent entries that will still be needed later to be
deleted. See the attached patch "quick-hack" for a draft.

> That sounds like more than "very extensive use" that sounds like
> "insane use".

I agree. However, I'd rather fix the issue, if feasible, than to
tag it as WONTFIX. :)
 [2015-06-17 22:54 UTC] james dot h dot cracknell at gmail dot com
The patch looks as though it should do the trick.

> I'd rather fix the issue, if feasible, than to tag it as WONTFIX.

Well it's definitely a bug; as it stands once you are in preg_replace_callback, the PCRE functions are a timebomb; e.g:
<https://github.com/WordPress/WordPress/blob/f62bf61b2c59bf5484888f22d0b7dff0f0df050a/wp-includes/shortcodes.php#L200>
 [2015-06-18 09:17 UTC] ab@php.net
Christoph, imho the idea of your patch is good, however it looks some more complex. Currently the .refcount member is not initialized, so it's most likely not zero. This results it that entries would be never cleanup. To init it it's around the line 455 (see the new_entry var). When it's initialized, it's probably about going through the usages and see where else it has to be incremented (like the same pattern could be already compiled in some match usage, etc.). Does it sound feasible?

Btw I do get this crash and on Windows, same bt. Do you still use vc11?

Thanks.
 [2015-06-18 09:21 UTC] ab@php.net
-Status: Open +Status: Verified
 [2015-06-18 12:49 UTC] cmb@php.net
The following patch has been added/updated:

Patch Name: pcre-refcount
Revision:   1434631776
URL:        https://bugs.php.net/patch-display.php?bug=69864&patch=pcre-refcount&revision=1434631776
 [2015-06-18 12:50 UTC] cmb@php.net
I still have not been able to get a crash; I've tested with
several 7.0.0alpha1 binaries and own VC 14 x64 builds. The tests
always exited cleanly, albeit with wrong results (either NULL or
"ba" instead of "bb").

Anyhow, I've attached an improved patch, including initialization
of the .refcount member, and additional refcounting for all paths
where the PCE might have to be prevented from being garbage
collected.

The patch is against master; for PHP 5.6 most notably the hack to
get the PCE at the beginning of pcre_clean_cache() has to be
changed. This raises a somewhat related issue wrt.
php_free_pcre_cache(). Line 92[1] does basically the same, but
assumes that data is a zval*. However, it is supposed to be a
pcre_cache_entry**[2]. That is currently not a real issue, as
.value is the first member of struct _zval_struct (and zend_value
is a union) but if the layout will change, Z_PTR_P(data) would
return garbage.

[1] <https://github.com/php/php-src/blob/8c8ad8f40ed9af2d95057a078dbaa844d072cb68/ext/pcre/php_pcre.c#L92>
[2] <https://github.com/php/php-src/blob/8c8ad8f40ed9af2d95057a078dbaa844d072cb68/ext/pcre/php_pcre.c#L492>
 [2015-06-18 14:06 UTC] kalle@php.net
-Assigned To: +Assigned To: cmb
 [2015-06-18 14:06 UTC] kalle@php.net
Fix assignee (was lost after Rasmus' comment)
 [2015-06-18 20:31 UTC] ab@php.net
I can see many errors like this with the new patch and the snippet @james posted earlier

==54386== Invalid read of size 1
==54386==    at 0x409ED6D: ???
==54386==    by 0xD801E77: ???
==54386==    by 0xFFEFF3D8F: ???
==54386==  Address 0xe699641 is 65 bytes inside a block of size 264 free'd
==54386==    at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==54386==    by 0x4E14A2: free_read_only_data (pcre_jit_compile.c:2139)
==54386==    by 0x502C4A: _pcre_jit_free (pcre_jit_compile.c:10532)
==54386==    by 0x4D311E: pcre_free_study (pcre_study.c:1672)
==54386==    by 0x503C79: php_free_pcre_cache (php_pcre.c:96)
==54386==    by 0xA3DF4C: _zend_hash_del_el_ex (zend_hash.c:935)
==54386==    by 0xA3E032: _zend_hash_del_el (zend_hash.c:959)
==54386==    by 0xA3F4DD: zend_hash_apply_with_argument (zend_hash.c:1463)
==54386==    by 0x504D30: pcre_get_compiled_regex_cache (php_pcre.c:450)
==54386==    by 0x5073D7: php_pcre_replace (php_pcre.c:1028)
==54386==    by 0x508322: php_replace_in_subject (php_pcre.c:1361)
==54386==    by 0x508976: preg_replace_impl (php_pcre.c:1422)

But in general - yep, in this case it's probably better to concentrate to fix master first, and then to backport into 5.6 when we see it stable in master. Not sure what's wrong with refcounts, probably better just to debug it.

Thanks.
 [2015-06-19 13:40 UTC] cmb@php.net
Have you tried without the patch, Anatol? I get very similar
results before having applied the patch.

Nearly all of the warnings seem to be caused by
php_free_pcre_cache(). Storing non-zvals in a HashTable[1] might
not work reliably?

[1] <http://lxr.php.net/xref/PHP_TRUNK/ext/pcre/php_pcre.c#492>
 [2015-06-19 14:50 UTC] ab@php.net
Hi Christoph,

nope, HashTable is unlikely to cause an issue. It's a very core API which is used everywhere, so it's probably not causing this.

But if you're using the snippet from @james which is already know to cause crash, clear there'll be such kinds of errors. So don't even need to mention that. And the cause of it, as reported and also confirmed by yourself, is that the cache is freed without check which causes accesses to the invalid memory. Except you've found out something new ;)

Thanks.
 [2015-06-19 14:51 UTC] ab@php.net
i meant "using a crashy snippet without any patch" - so that's the plain situation where you see all the errors ... :)

Thanks.
 [2015-06-19 15:26 UTC] ab@php.net
Ah, as from the patch, it should be 

pcre_cache_entry *pce = (pcre_cache_entry *) Z_PTR_P(data);

but it probably won't fix the issue, just to mention.

Thanks.
 [2015-06-19 16:38 UTC] ab@php.net
Christoph, debugged a bit w/o your patch. Consider these two lines from the trace

pcre_study.c:1672
php_pcre.c:96

If you comment out the freeing of pce->extra, it'll pass but with memleaks. so that's not your patch crashing at all. So suspect some double free with pce->extras is the issue now. I think we should rethink the direction to go.

Thanks.
 [2015-06-20 00:26 UTC] cmb@php.net
Thanks for further investigation, Anatol, and the valuable hints.

Indeed, the valgrind issues are not (directly) related to the
patch. The following test script raises lots of these for current
master (w/o the patch):

    <?php
    for ($i = 0; $i < 10000; $i++) {
        preg_replace('/foo'.$i.'bar/', 'baz', '???foo'.$i.'bar???');
    }
    ?>
    
All this depends on pcre.jit. If this (php.ini) configuration
option is disabled, valgrind reports no problems. Also James' test
script runs fine with the patch if pcre.jit is disabled.

It seems to me that the valgrind/pcre.jit issue is not directly
related to this ticket, and that it might be best to open a new
ticket for it. I think it would be preferable to solve that issue
before finally tackling this ticket.

Anyhow, disregard my comments regarding storing PCE vs. zval in
the HashTable. I had not read the code carefully enough.
 [2015-06-20 15:46 UTC] ab@php.net
Hi Christoph,

great you stay on this. To me it's currently not clear whether it's not the same thing, so I would stay in the scope of this ticket therefore. But feel free to open another one, if you're sure.

What I've discovered yet - pcre_free_study() is yeah, JIT related. Furthermore - regarding to the man page, it requires a pointer produced by pcre_study(). It's not always the case currently. When relying on this information, maybe some more could come out.

I'd suggest just to investigate further till we come to the clear understanding about what's going on.

Thanks.
 [2015-06-22 12:03 UTC] cmb@php.net
-Assigned To: cmb +Assigned To:
 [2015-06-22 12:03 UTC] cmb@php.net
My further findings regarding the invalid read (w/o the
pcre-refcount patch):

* all entries in the cache have been studied[1] (if that failed
  extra==NULL, so pcre_free_study won't be called)
* it occurs only if entries are removed from the cache before
  shutdown[2]
* if the cache is circumvented (by constructing a zval with the
  pce instead of calling zend_hash_update_mem[3] and freeing that
  zval at the end of php_pcre_replace[4]) the invalid reads may
  still happen
* the valgrind backtrace is clear on where the respective memory
  block has been freed, but it doesn't show where the invalid read
  happens. Interestingly, frame #3 seems to be allocated on the
  stack.
  
However, I'm not able to find what is causing the invalid read. :(

[1] <https://github.com/php/php-src/blob/php-7.0.0alpha1/ext/pcre/php_pcre.c#L426>
[2] <https://github.com/php/php-src/blob/php-7.0.0alpha1/ext/pcre/php_pcre.c#L448>
[3] <https://github.com/php/php-src/blob/php-7.0.0alpha1/ext/pcre/php_pcre.c#L492>
[4] <https://github.com/php/php-src/blob/php-7.0.0alpha1/ext/pcre/php_pcre.c#L1013>
 [2015-06-22 13:45 UTC] ab@php.net
Hi Christoph,

thanks for the info. Yeah, debugged on this almost the whole Sunday, but no good result. I can confirm your insights, also I can add that i've tried several programs in pure C which didn't reproduce the crash. But what I saw also, that some pointer addresses are reused, so it could be something in PCRE which doesn't free/release properly.

While this doesn't affect the normal work (usage of so many patterns at once), this needs to be fixed. I continue on this, maybe a temporary fix could be increasing the cache size - that would make the whole at least a bit more robust.

Thanks.
 [2015-06-22 14:24 UTC] cmb@php.net
Indeed, it might be a libpcre issue, but I also was not able to
reproduce the behavior with a pure C program.

Increasing the cache size as a temporary fix might not be
necessary. I found that it seems to be sufficient to only keep the
pcre_extra data until shutdown; the compiled regex, the tables and
strings could be released, what would save some memory. So it
might be an option to store the pcre_extra data in a second cache
(a fixed sized array might be sufficient) instead of calling
pcre_free_study() in php_free_pcre_cache(), and to free the
members of the array in the shutdown function of the extension.

Another option might be to make the cache size an ini setting, so
users can adjust it if necessary.
 [2015-06-23 07:11 UTC] ab@php.net
Ok, lessons learned - when debugging something involving PCRE JIT issues, always add --smc-check=all to valgrind. JIT is a self modifying code which valgrind has to be told about explicitly. Grateful thanks Zoltán Herczeg for supporting us with investigations on this issue.

Christoph, looks like we can apply the patch fixing the actual issue reported :)

Thanks.
 [2015-06-23 11:04 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 [2015-06-23 14:52 UTC] cmb@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=php-src.git;a=commit;h=a39beaa2514514892d34bc2e9fd89f7333a6ed1f
Log: Fixed bug #69864 (Segfault in preg_replace_callback)
 [2015-06-23 14:52 UTC] cmb@php.net
-Status: Verified +Status: Closed
 [2015-06-23 18:04 UTC] ab@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=php-src.git;a=commit;h=a39beaa2514514892d34bc2e9fd89f7333a6ed1f
Log: Fixed bug #69864 (Segfault in preg_replace_callback)
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 22 10:01:30 2025 UTC