php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #73762 recursive pattern breaks from pcre 8.36 to 8.37
Submitted: 2016-12-16 16:51 UTC Modified: 2016-12-17 05:14 UTC
From: zhihong dot chen dot cn at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.6.9 OS: Ubuntu 14.04.5 X64
Private report: No CVE-ID: None
 [2016-12-16 16:51 UTC] zhihong dot chen dot cn at gmail dot com
Description:
------------
From php 5.6.9 after pcre upgrade to 8.37 and later.

Recursive matching breaks. generate different result.

when using php 5.6.8  and before version

Will generate expect result. 

Later version will generate incorrect result.



Test script:
---------------
<?PHP
const pattern =  "#\{(?:(?<name>\w+))(?:/\}|\}(?<snippet>(?:[^{]*|(?!\{\w+:\})++|(?R))*+)\{end\})#mu";
$ss = <<<'EOD'
{first} {second}within second{end} {end}
EOD;

function testRun($input ){
    echo $input[0];
    echo PHP_EOL;
}

preg_replace_callback( pattern ,'testRun',$ss);


Expected result:
----------------
{first} {second}within second{end} {end}


Actual result:
--------------
{second}within second{end}


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-12-17 05:13 UTC] requinix@php.net
-Summary: recursive pattern breaks from pcre 8.3.36 +Summary: recursive pattern breaks from pcre 8.36 to 8.37 -Status: Open +Status: Not a bug
 [2016-12-17 05:13 UTC] requinix@php.net
This would be a behavioral change in pcrelib, not in PHP.

Looking through the changelog, nothing stands out as a change that would obviously impact your regex. However #11 seems at least partially relevant:
>11. If an assertion that was used as a condition was quantified with a minimum
>    of zero, matching went wrong. In particular, if the whole group had
>    unlimited repetition and could match an empty string, a segfault was
>    likely. The pattern (?(?=0)?)+ is an example that caused this. Perl allows
>    assertions to be quantified, but not if they are being used as conditions,
>    so the above pattern is faulted by Perl. PCRE has now been changed so that
>    it also rejects such patterns.
http://www.pcre.org/original/changelog.txt

Fixing that may have exposed the problem with your regex:

#\{(?:(?<name>\w+))(?:/\}|\}(?<snippet>(?:[^{]*|(?!\{\w+:\})++|(?R))*+)\{end\})#mu
                                                            ^1       ^2

1. A quantifier on an assertion doesn't make sense, even if it is allowed. Apparently Perl allows it so pcrelib does too. This is minor and doesn't affect the matching.
2. The possessive quantifier does affect the matching. The snippet portion will match the space after {first} but not attempt the rest of the snippet pattern, then stop when {end} fails to match, and the engine backtracks until it eventually resumes matching at {second}.

If I change the regex to
  #\{(?:(?<name>\w+))(?:/\}|\}(?<snippet>(?:[^{]*|(?!\{\w+:\})|(?R))*)\{end\})#mu
                                                             ^      ^
then I get the expected results in all versions of PHP.
https://3v4l.org/2BrI6
 [2016-12-17 05:14 UTC] requinix@php.net
-PHP Version: 7.0.14 +PHP Version: 5.6.9
 [2016-12-17 09:17 UTC] zhihong dot chen dot cn at gmail dot com
Many thanks.

It's my pattern's problem. 

This bug could be identified as Invalid.
 [2016-12-17 15:11 UTC] zhihong dot chen dot cn at gmail dot com
Seems that php 5.3.9 and later goes the right way.
PHP 5.3.8 and before accept the incorrect pattern.

After regress testing. 

I confirmed the changed pattern generate the  correct result.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 05:01:33 2024 UTC