php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74132 preg_match_all hitting backtrack limit in master and not in released versions
Submitted: 2017-02-20 04:14 UTC Modified: 2018-03-21 13:43 UTC
From: liyan at bianhua8 dot com Assigned: cmb (profile)
Status: Closed Package: PCRE related
PHP Version: 7.1Git-2017-02-20 (Git) OS: ubuntu
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: liyan at bianhua8 dot com
New email:
PHP Version: OS:

 

 [2017-02-20 04:14 UTC] liyan at bianhua8 dot com
Description:
------------
after call preg_match_all function.
the third parameter $matches matched a array.
but the return value is false.


Test script:
---------------
<?php
$content = '<th class="title">
    <div class="subject">
        <a href="158391487256182472-1-1.html" title="Jomashop:Swarovski"></a>
    </div>
</th>
<td class="author">
    <a href="profile-42357964-1.html" title="author"><span>cat</span></a>
    <span>2017-02-16</span>
</td>
<div class="link0 list-side-hd moderator-noborder obj2subject"><h3 class="fl"></h3></div><ul class="list-side-bd list-recommend list-recommend2 link1"><li><a target="_blank" href="//go.cqmmgo.com/forum-462505-thread-94061456905358538-1-1.html"><img width="120" height="160" title="#" alt="#" src="//att3.citysbs.com/120x120/chongqing/2016/03/09/09/160x120-092826_v2_11861457486906377_58b326482e2aa1f1da0f8455ac42187f.jpg"></a></li><li><a target="_blank" href="//go.cqmmgo.com/forum-462505-thread-179021457000027333-1-1.html"><img width="120" height="160" title="#" alt="#" src="//att3.citysbs.com/120x120/chongqing/2016/03/09/09/160x120-092826_v2_20771457486906773_31eb80d49e9ef06418003e577cc1453f.jpg"></a></li><li><a target="_blank" href="//go.cqmmgo.com/forum-462505-thread-93091456983256969-1-1.html"><img width="120" height="160" title="#" alt="#" src="//att3.citysbs.com/120x120/chongqing/2016/03/09/09/160x120-092827_v2_20231457486907098_05bc3d4608c961af0660722e03049e20.jpg"><span>abc</span></a></li><li><a target="_blank" href="//go.cqmmgo.com/forum-462505-thread-13801456844284137-1-1.html"><img width="120" height="160" title="#" alt="#" src="//att3.citysbs.com/120x120/chongqing/2016/03/09/09/160x120-092827_v2_14121457486907436_d49c002ba8675553c5436e80f305aca7.jpg"><span>xxxxxxxxxxxxxxxxxxxxxxxxxxxxx</span></a></li></ul></div>';
$ret = preg_match_all('/subject[\s\S]+?href="(.+?)"[\s\S]+?title="([\s\S]+?)"[\s\S]+?profile/', $content, $matches);

var_dump($matches); // have array result
var_dump($ret); // bug the return value is false


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-02-20 08:18 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2017-02-20 08:18 UTC] requinix@php.net
Seems to be working fine. https://3v4l.org/ApKvQ
 [2017-02-20 09:14 UTC] liyan at bianhua8 dot com
-Status: Feedback +Status: Open
 [2017-02-20 09:14 UTC] liyan at bianhua8 dot com
thanks,
copy bellow script to https://3v4l.org/ApKvQ, don't format the style, the bug can recur.

the outputs:
array(3) {
  [0]=>
  array(1) {
    [0]=>
    string(147) "subject">
        <a href="158391487256182472-1-1.html" title="Jomashop:Swarovski"></a>
    </div>
</th>
<td class="author">
    <a href="profile"
  }
  [1]=>
  array(1) {
    [0]=>
    string(27) "158391487256182472-1-1.html"
  }
  [2]=>
  array(1) {
    [0]=>
    string(20) "Jomashop:Swarovski"
  }
}
bool(false)
 [2017-02-20 09:59 UTC] requinix@php.net
-Status: Open +Status: Feedback
 [2017-02-20 09:59 UTC] requinix@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.

Copy what?
 [2017-02-20 12:02 UTC] liyan at bianhua8 dot com
-Status: Feedback +Status: Open
 [2017-02-20 12:02 UTC] liyan at bianhua8 dot com
follow is the test script, same as my first post.

<?php
$content = '<th class="title">
    <div class="subject">
        <a href="158391487256182472-1-1.html" title="Jomashop:Swarovski"></a>
    </div>
</th>
<td class="author">
    <a href="profile-42357964-1.html" title="author"><span>cat</span></a>
    <span>2017-02-16</span>
</td>
<div class="link0 list-side-hd moderator-noborder obj2subject"><h3 class="fl"></h3></div><ul class="list-side-bd list-recommend list-recommend2 link1"><li><a target="_blank" href="//go.cqmmgo.com/forum-462505-thread-94061456905358538-1-1.html"><img width="120" height="160" title="#" alt="#" src="//att3.citysbs.com/120x120/chongqing/2016/03/09/09/160x120-092826_v2_11861457486906377_58b326482e2aa1f1da0f8455ac42187f.jpg"></a></li><li><a target="_blank" href="//go.cqmmgo.com/forum-462505-thread-179021457000027333-1-1.html"><img width="120" height="160" title="#" alt="#" src="//att3.citysbs.com/120x120/chongqing/2016/03/09/09/160x120-092826_v2_20771457486906773_31eb80d49e9ef06418003e577cc1453f.jpg"></a></li><li><a target="_blank" href="//go.cqmmgo.com/forum-462505-thread-93091456983256969-1-1.html"><img width="120" height="160" title="#" alt="#" src="//att3.citysbs.com/120x120/chongqing/2016/03/09/09/160x120-092827_v2_20231457486907098_05bc3d4608c961af0660722e03049e20.jpg"><span>abc</span></a></li><li><a target="_blank" href="//go.cqmmgo.com/forum-462505-thread-13801456844284137-1-1.html"><img width="120" height="160" title="#" alt="#" src="//att3.citysbs.com/120x120/chongqing/2016/03/09/09/160x120-092827_v2_14121457486907436_d49c002ba8675553c5436e80f305aca7.jpg"><span>xxxxxxxxxxxxxxxxxxxxxxxxxxxxx</span></a></li></ul></div>';
$ret = preg_match_all('/subject[\s\S]+?href="(.+?)"[\s\S]+?title="([\s\S]+?)"[\s\S]+?profile/', $content, $matches);

var_dump($matches); // have array result
var_dump($ret); // bug the return value is false
?>
 [2017-02-20 13:31 UTC] requinix@php.net
-Summary: preg_match_all return value error. +Summary: preg_match_all hitting backtrack limit in master and not in released versions
 [2017-02-20 13:31 UTC] requinix@php.net
It looks like libpcre in master is hitting the backtrack limit. preg_match_all() returning false means there was an error during matching, but it can still "return" any matches that were found. JIT on or off doesn't seem to make a difference here.

I don't know why master cannot complete the matching - libpcre hasn't been upgraded since a year ago...


Solution:
Besides increasing the backtrack limit, you can make a simple change to your regex. The problem is that PCRE will try a second match starting at 
  <div class="link0 list-side-hd moderator-noborder obj2subject">
                                                        ^
and reach the backtrack limit before it fails to match.

If I modify the regex as
  /"subject"[\s\S]+?...
then it matches correctly and returns successfully for me.


Side comment: don't use regular expressions to parse HTML. Use DOM.
 [2017-02-20 13:35 UTC] andrew dot nester dot dev at gmail dot com
as I can see from this code you are really receiving error. It's PHP_PCRE_BACKTRACK_LIMIT_ERROR
you can set higher backtrack limit like this and you'll be fine (tested on my end)
ini_set("pcre.backtrack_limit", "10000000")
 [2017-02-21 06:05 UTC] liyan at bianhua8 dot com
-Status: Open +Status: Closed
 [2017-02-21 06:05 UTC] liyan at bianhua8 dot com
i see, thanks requinix and andrew a lot :)
 [2017-02-21 06:19 UTC] requinix@php.net
-Status: Closed +Status: Re-Opened
 [2017-02-21 06:19 UTC] requinix@php.net
I'm not convinced this isn't a bug. Certainly it might not be, but something changed and I'm not sure it wasn't upstream in libpcre or that it was intentional.
 [2018-03-21 13:43 UTC] cmb@php.net
-Status: Re-Opened +Status: Closed -Assigned To: +Assigned To: cmb
 [2018-03-21 13:43 UTC] cmb@php.net
<https://3v4l.org/ApKvQ> works fine, so I assume this has been a
temporary issue.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jul 03 15:01:34 2025 UTC