php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #71720 Segfault in preg_match()
Submitted: 2016-03-06 00:35 UTC Modified: 2016-03-06 03:37 UTC
From: hanno at hboeck dot de Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.6.19 OS: Linux
Private report: No CVE-ID: None
 [2016-03-06 00:35 UTC] hanno at hboeck dot de
Description:
------------
Trying to match a certain regular expressions with large enough inputs can crash php. The example below is a simplified testcase, but this happened on a real production system with real code (feedwordpress).

(On different systems the size I had to use to crash was different, but 15000 was large enough to crash on all systems I tested)

This happens with 5.6.18 and 5.5.32. Interesting: With php 7 I can only reproduce it on some systems, on others it never crashes, even with insanely large inputs.

There are existing bug reports that indicate they relate to a similar problem, but they all refer to windows. It's a stack exhaustion, probably related to pcre's limits. I think the limits should be set in a way that they don't crash in default situations.

Test script:
---------------
<?php preg_match("/(0)*/",str_repeat("0",15000));


Expected result:
----------------
Don't segfault.

Actual result:
--------------
Segfault.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-03-06 01:14 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2016-03-06 01:14 UTC] requinix@php.net
There are limits you can lower. Try pcre.backtrack_limit.
http://php.net/manual/en/pcre.configuration.php

Lowering them will make a regex fail sooner (perhaps resulting in a false negative), but if set too high then an inefficient regex on a large enough input string will smash the stack - and for the entire PHP process.

Consider what your example regex is doing: it needs at least one stack frame for each of those '0's that it matches in case it has to backtrack. Apparently the limit for your system is something lower than what PCRE needs to handle that 15k string.

Every single bug report I've seen like this can be solved by making the regex more efficient. Typically by removing unneeded parentheses, adjusting quantifiers, or in more extreme cases using once-only subpatterns. The alternative is lowering the  pcre.backtrack_limit setting.

Perhaps you should submit a support request with the FeedWordPress folks?
 [2016-03-06 02:53 UTC] hanno at hboeck dot de
Actually it seems this already was reported to feedwordpress:
https://github.com/radgeek/feedwordpress/pull/23

But nevertheless I think this is something PHP should act on by having sane default limits here. If a regexp can exhaust more memory than PHP provides for it  in a default configuration then the limits are wrong.
I played around a bit and if I set either backtrack_limit or recursion_limit to 10000 it no longer crashes. Not sure if this is too tough of a limit and will have practical implications.
 [2016-03-06 03:37 UTC] requinix@php.net
There's the problem: what is a sane value? The answer is that it depends. Depends on the regex, on the system, on the code, on the developer... All PHP can do is pick a value that seems to work and educate users about why and when to change it. That applies to every other php.ini value too.

100k seems to work well enough for most cases. Just not this one.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 15:01:28 2024 UTC