php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Sec Bug #70345 Multiple vulnerabilities related to PCRE functions
Submitted: 2015-08-24 17:31 UTC Modified: 2015-09-01 18:44 UTC
From: bugs at themadbat dot com Assigned:
Status: Closed Package: PCRE related
PHP Version: 5.4 OS: Windows/Linux
Private report: No CVE-ID: None
 [2015-08-24 17:31 UTC] bugs at themadbat dot com
Description:
------------
The pcre_exec() function generates a list of "offsets", each consisting
of a start and an end position within the subject string. Throughout the code its often assumed that
for each "offset", the start position is smaller than or equal to the end position.
However, certain regular expressions break this assumption (see the testcase for an example).

This leads to:
- Multiple heap overflows (through preg_match() or preg_replace(), for instance): in the best case scenario these are simply denial-of-service; exploitation to achieve arbitrary code execution might be possible, but not trivial.

- Memory exhaustion (through preg_split(), for instance)

I'm not providing filenames / line numbers with this bug report because pcre_exec() is used extensively throughout the codebase, and from what I have seen, all of its uses fail to fully validate the returned offsets and are thus vulnerable, to varying degrees.

Because regular expression functions are often exposed to user input, I believe this could be a fairly serious bug.

Test script:
---------------
<?php

$regex = '/(?=xyz\K)/';
$subject = "aaaaxyzaaaa";

// Comment/uncomment below as wanted.
// All 3 functions are vulnerable (note, other functions are affected as well)
preg_match($regex, $subject, $matches);
preg_replace($regex, '\0', $subject);
preg_split($regex, $subject);


Actual result:
--------------
A trace for preg_match, showing a SIGSEGV caused by a runaway memcpy:

(gdb) r
Starting program: php-5.6.12/sapi/cli/php go.php

Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:118
118     ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:118
#1  0x000000000047b6fc in memcpy (__len=18446744073709551613, __src=<optimized out>, __dest=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/string3.h:51
#2  php_pcre_get_substring_list (subject=subject@entry=0x7ffff7ebbe18 "aaaaxyzaaaa", ovector=ovector@entry=0x7ffff7fcebc0, stringcount=stringcount@entry=1, listptr=listptr@entry=0x7fffffffaee8)
    at php-5.6.12/ext/pcre/pcrelib/pcre_get.c:477
#3  0x00000000004a569f in php_pcre_match_impl (pce=pce@entry=0x1012cf0, subject=0x7ffff7ebbe18 "aaaaxyzaaaa", subject_len=11, return_value=return_value@entry=0x7ffff7fcea58, subpats=0x7ffff7fce900, global=global@entry=0, use_flags=0, 
    flags=0, start_offset=0) at php-5.6.12/ext/pcre/php_pcre.c:707
#4  0x00000000004a614b in php_do_pcre_match (ht=3, return_value=0x7ffff7fcea58, global=0, return_value_used=<optimized out>, this_ptr=<optimized out>, return_value_ptr=<optimized out>) at php-5.6.12/ext/pcre/php_pcre.c:575
#5  0x000000000077894e in zend_do_fcall_common_helper_SPEC (execute_data=<optimized out>) at php-5.6.12/Zend/zend_vm_execute.h:558
#6  0x000000000070d968 in execute_ex (execute_data=0x7ffff7f9a158) at php-5.6.12/Zend/zend_vm_execute.h:363
#7  0x00000000006d4740 in zend_execute_scripts (type=type@entry=8, retval=retval@entry=0x0, file_count=file_count@entry=3) at php-5.6.12/Zend/zend.c:1341
#8  0x00000000006722f2 in php_execute_script (primary_file=primary_file@entry=0x7fffffffd500) at php-5.6.12/main/main.c:2597
#9  0x000000000077a5bf in do_cli (argc=2, argv=0xe93760) at php-5.6.12/sapi/cli/php_cli.c:994
#10 0x0000000000426980 in main (argc=2, argv=0xe93760) at php-5.6.12/sapi/cli/php_cli.c:1378

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-08-25 08:38 UTC] kaplan@php.net
I see a few related fixes staged for the next pcrelib release (8.38) at http://vcs.pcre.org/pcre/code/trunk/ChangeLog?view=markup
 [2015-08-25 10:28 UTC] ab@php.net
-Status: Open +Status: Feedback
 [2015-08-25 10:28 UTC] ab@php.net
Hi,

this case was discussed on the sec lists a month ago, there was a patch suggestion https://gist.github.com/weltling/b77f85be31c55640cb67 but seems it did not make it into the last sec release (but there was a plenty of security fixes, so most likely lost in the flood). Could you please check the patch?

The following I wrote there in the thread:

=====================
In general – when pcre_exec() devilers 1, it means there was no substrings matched, so we shouldn’t call pcre_get_substring_list(). However we do it, so let’s just check whether the offsets are correct before doing it. More info also in pcreapi.3. This patch is better than previous one because it avoids a lot of function calls.

I was also contacting Philip, the actual dev of the PCRE. This is what comes from him on the PCRE2 doc page http://www.pcre.org/pcre2.txt

“ If  a  pattern uses the \K escape sequence within a positive assertion,
       the reported start of a successful match can be greater than the end of
       the  match.   For  example,  if the pattern (?=ab\K) is matched against
       "ab", the start and end offset values for the match are 2 and 0.”
=====================

So it is obviously a bug in PHP. 5.4 affected as well.

Thanks.
 [2015-08-25 10:28 UTC] ab@php.net
-PHP Version: 5.6.12 +PHP Version: 5.4
 [2015-08-25 11:10 UTC] bugs at themadbat dot com
-Status: Feedback +Status: Open
 [2015-08-25 11:10 UTC] bugs at themadbat dot com
Hello,

ab: I think you are right, and the PCRE dev does not consider this a bug. So it probably hasn't been/won't be changed in PCRE.

The patch looks OK to me, but it only covers 1 of the instances of pcre_exec() in the codebase.
For example the preg_replace() in my testcase will still cause a heap overflow.
I think, in general, a check similar to the one in the patch will have to be added after every instance of pcre_exec() where the
returned offsets are subsequently used by PHP.

Here's the backtrace for a preg_replace() crash:

Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:152
152     ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:152
#1  0x00000000004a6a6f in memcpy (__len=18446744073709551613, __src=<optimized out>, __dest=0x7ffff7fcd337) at /usr/include/x86_64-linux-gnu/bits/string3.h:51
#2  php_pcre_replace_impl (pce=pce@entry=0x1012cf0, subject=subject@entry=0x7ffff7ebbe18 "aaaaxyzaaaa", subject_len=subject_len@entry=11, replace_val=replace_val@entry=0x7ffff7fce890, is_callable_replace=is_callable_replace@entry=0, 
    result_len=result_len@entry=0x7fffffffaf6c, limit=limit@entry=-1, replace_count=replace_count@entry=0x7fffffffaf74) at php-5.6.12/ext/pcre/php_pcre.c:1241
#3  0x00000000004a71e3 in php_pcre_replace (regex=<optimized out>, regex_len=<optimized out>, subject=0x7ffff7ebbe18 "aaaaxyzaaaa", subject_len=11, replace_val=replace_val@entry=0x7ffff7fce890, 
    is_callable_replace=is_callable_replace@entry=0, result_len=result_len@entry=0x7fffffffaf6c, limit=limit@entry=-1, replace_count=replace_count@entry=0x7fffffffaf74) at php-5.6.12/ext/pcre/php_pcre.c:1052
#4  0x00000000004a72f9 in php_replace_in_subject (regex=0x7ffff7fce8f0, replace=0x7ffff7fce890, subject=0x7ffff7f9a1e0, result_len=result_len@entry=0x7fffffffaf6c, limit=limit@entry=-1, is_callable_replace=is_callable_replace@entry=0, 
    replace_count=replace_count@entry=0x7fffffffaf74) at php-5.6.12/ext/pcre/php_pcre.c:1383
#5  0x00000000004a7905 in preg_replace_impl (ht=3, return_value=0x7ffff7fce8c0, is_callable_replace=0, is_filter=0, return_value_used=<optimized out>, this_ptr=<optimized out>, return_value_ptr=<optimized out>)
    at php-5.6.12/ext/pcre/php_pcre.c:1482
#6  0x000000000077894e in zend_do_fcall_common_helper_SPEC (execute_data=<optimized out>) at php-5.6.12/Zend/zend_vm_execute.h:558
#7  0x000000000070d968 in execute_ex (execute_data=0x7ffff7f9a118) at php-5.6.12/Zend/zend_vm_execute.h:363
#8  0x00000000006d4740 in zend_execute_scripts (type=type@entry=8, retval=retval@entry=0x0, file_count=file_count@entry=3) at php-5.6.12/Zend/zend.c:1341
#9  0x00000000006722f2 in php_execute_script (primary_file=primary_file@entry=0x7fffffffd500) at php-5.6.12/main/main.c:2597
#10 0x000000000077a5bf in do_cli (argc=2, argv=0xe93760) at php-5.6.12/sapi/cli/php_cli.c:994
#11 0x0000000000426980 in main (argc=2, argv=0xe93760) at php-5.6.12/sapi/cli/php_cli.c:1378
 [2015-08-25 17:53 UTC] ab@php.net
-Status: Open +Status: Feedback
 [2015-08-25 17:53 UTC] ab@php.net
Thanks for the follow up. I've reworked the patch so now all three functions should be fixed https://gist.github.com/weltling/b77f85be31c55640cb67 

Thanks.
 [2015-08-25 19:02 UTC] bugs at themadbat dot com
-Status: Feedback +Status: Open
 [2015-08-25 19:02 UTC] bugs at themadbat dot com
Looks good; seems preg_split() has the same problem; but I think that should be the last one, at least in php_pcre.c ... sorry I didnt include them all in the initial report but as I said there's many instances where offset validation was needed. Here's the test case, should lead to memory exhaustion, didn't check if more severe exploitation is possible:

<?php
$regex = '/(?=xyz\K)/';
$subject = "aaaaxyzaaaa";

$v = preg_split($regex, $subject);
print_r($v);
?>
 [2015-08-25 19:16 UTC] bugs at themadbat dot com
ab: Did some more testing and it seems its not enough to check the offsets only when count == 1.
The testcase below returns 2, and the offsets are [7,4] and [3,4]. The first pair still breaks the assumption.
Note the regular expression in this test case is slightly different.

<?php
$regex = '/(a(?=xyz\K))/';
$subject = "aaaaxyzaaaa";
preg_match($regex, $subject, $matches);
 [2015-08-26 11:48 UTC] ab@php.net
-Status: Open +Status: Feedback
 [2015-08-26 11:48 UTC] ab@php.net
Thanks for the further checks. I've updated the gist in https://gist.github.com/weltling/b77f85be31c55640cb67 . This should fix the crashes, however doesn't fix the functional part. I would still suggest to apply this at least to 5.4 and 5.5 (if to upper as well), because fixing the functional part is going to be quite a big chunk that can have subsequent bugs. So for the branches in security only mode no functional fix should be done, as it can imply further development need.

About the functional part:

$regex = '/(?=xyz\K)/';
$subject = "aaaaxyzaaaa";

This should match with "xyz". And not sure, but it should split like "aaaaxyz" and "xyzaaaa". 

$regex = '/(a(?=xyz\K))/';
$subject = "aaaaxyzaaaa";

This should should match with "xyz" and "a".

As you've spotted as well, under circumstances offsets can appear in reverse order. Either some of them or all. Achieving this to be handled accordingly will require quite some refactoring in the PHP ext. So maybe even merging only the crash fix in all the branches were appropriate to have stability in the next release, and then going with a good implementation for 5.6+ afterwards.

Thanks.
 [2015-08-26 12:25 UTC] bugs at themadbat dot com
-Status: Feedback +Status: Open
 [2015-08-26 12:25 UTC] bugs at themadbat dot com
I'll let you decide where to apply the fix, because I'm not familiar with how you generally deal with these things within the PHP project.

>> "As you've spotted as well, under circumstances offsets can appear in reverse order. Either some of them or all."
Yea... I think perhaps only the first offset can be in reverse order. But its complicated to make sure of this because the PCRE source is quite complex. I think your patch looks good. In the future you could indeed check all the returned offsets to see if they're all in the expected order.

Regarding the functional part; I'm not sure anything needs to be changed. The (?=) syntax indicates a look-ahead, and those are not captured.
For instance,

$regex = '/(a(?=xyz\K))/';
$subject = "aaaaxyzaaaa";

I think this should match on "a", as long as its followed by "xyz", but only the "a" is captured. Its hard to say what the 'correct' way to implement preg_split(), for instance, would be in this case.

I expect these cases are very very rare in actual usage, so the most important thing is to patch the crashes, for security reasons.
 [2015-08-29 05:50 UTC] stas@php.net
The patch seems to be fine with regard to offset handling, but when running this:

regex = '/(a(?=xyz\K))/';
$subject = "aaaaxyzaaaa";
preg_match($regex, $subject, $matches);

I get:

Warning: preg_match(): Get subpatterns list failed in /Users/smalyshev/php-5.4/mamp/70345.php on line 10

Is this the right behavior?
 [2015-08-30 15:32 UTC] bugs at themadbat dot com
stas: You're right, that is not really correct behaviour. As I said I'm not sure if this would have any effect for practical use of the pcre_* functions (the regex is overly complicated, the same thing could be done with simpler regexes).

However, maybe a better patch would be to iterate over the offsets returned by pcre_exec() and swap "start" and "end", when "start" > "end"? Then the rest of the pcre related code could stay unchanged.
 [2015-08-31 09:40 UTC] ab@php.net
Regarding '/(a(?=xyz\K))/' and the patch - functionally it is not correct. As mentioned before, it should match with "xyz" and "a". The \K patterns will deliver the offsets in reverse order. At least from what i've seen till now, even the offsets for subpatterns will be in reverse order. So in fact it should be fine to have just checked the first offsets pair. 

I've been working on a patch for getting the correct functionality, however it's not ready yet and can be a bit bigger than this one. Taking in account the fact that it'll be possibly the last 5.4 release and 5.5 in sec only, I'm quite unsure that it would be safe putting such a functional fix in there. Possibly there can be another bug in the implementation (as the patch will be bigger than just some lines) so that can require further development.

IMHO what should be done is getting in the crash fix in all the branches in first place, than the functional part should be fixed in 5.6+.

Thanks.
 [2015-09-01 18:55 UTC] stas@php.net
Automatic comment on behalf of stas
Revision: http://git.php.net/?p=php-src.git;a=commit;h=03964892c054d0c736414c10b3edc7a40318b975
Log: Fix bug #70345 (Multiple vulnerabilities related to PCRE functions)
 [2015-09-01 18:55 UTC] stas@php.net
-Status: Open +Status: Closed
 [2015-09-01 19:04 UTC] stas@php.net
Automatic comment on behalf of stas
Revision: http://git.php.net/?p=php-src.git;a=commit;h=03964892c054d0c736414c10b3edc7a40318b975
Log: Fix bug #70345 (Multiple vulnerabilities related to PCRE functions)
 [2015-09-01 19:07 UTC] stas@php.net
Automatic comment on behalf of stas
Revision: http://git.php.net/?p=php-src.git;a=commit;h=03964892c054d0c736414c10b3edc7a40318b975
Log: Fix bug #70345 (Multiple vulnerabilities related to PCRE functions)
 [2015-09-02 08:29 UTC] stas@php.net
Automatic comment on behalf of stas
Revision: http://git.php.net/?p=php-src.git;a=commit;h=03964892c054d0c736414c10b3edc7a40318b975
Log: Fix bug #70345 (Multiple vulnerabilities related to PCRE functions)
 [2015-09-03 18:10 UTC] ab@php.net
Automatic comment on behalf of stas
Revision: http://git.php.net/?p=php-src.git;a=commit;h=03964892c054d0c736414c10b3edc7a40318b975
Log: Fix bug #70345 (Multiple vulnerabilities related to PCRE functions)
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 22 11:01:28 2025 UTC