php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #67235 Possessive quantifier problem with preg_match()
Submitted: 2014-05-08 20:41 UTC Modified: 2014-05-12 12:56 UTC
From: david dot a dot schmitt at verizon dot com Assigned:
Status: Closed Package: PCRE related
PHP Version: 5.5.12 OS: Linux (Fedora 20)
Private report: No CVE-ID: None
 [2014-05-08 20:41 UTC] david dot a dot schmitt at verizon dot com
Description:
------------
1. preg_match() behaviour does not match pcregrep behaviour for some patterns involving possessive quantifiers.
2. preg_match() behaviour also appears internally inconsistent when switching from greedy to possessive quantifiers in those cases (+ to ++).

Below is my environment (Fedora 20):

$ php -V
PHP 5.5.12 (cli) (built: May  3 2014 07:10:11) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
$ pcregrep -V
pcregrep version 8.33 2013-05-28
$ php -r "phpinfo();" | grep PCRE
PCRE (Perl Compatible Regular Expressions) Support => enabled
PCRE Library Version => 8.33 2013-05-28

Test script:
---------------
#!/bin/sh

export GREEDY='\A(?:[^"]++|"(?:[^"]*+|"")*+")+'
export POSSESSIVE="$GREEDY+"
export TEXT='NON QUOTED "QUOT""ED"'
export PATTERN

SCRIPT='
$t = getenv("TEXT");
$p = getenv("PATTERN");
preg_match("/$p/", $t, $m);
print "$m[0]\n";
'

for PATTERN in "$GREEDY" "$POSSESSIVE"
do
    echo "$TEXT" | pcregrep -o -e "$PATTERN"
    php -r "$SCRIPT"
done


Expected result:
----------------
NON QUOTED "QUOT""ED"
NON QUOTED "QUOT""ED"
NON QUOTED "QUOT""ED"
NON QUOTED "QUOT""ED"


Actual result:
--------------
NON QUOTED "QUOT""ED"
NON QUOTED "QUOT""ED"
NON QUOTED "QUOT""ED"
NON QUOTED 


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-05-09 13:39 UTC] david dot a dot schmitt at verizon dot com
FYI, to clarify the script results you could change the TEXT setting in the script as follows:

export TEXT='NON QUOTED "QUOT""ED""NOT MATCHED'

Even with that change the expected and actual results remain the same.
 [2014-05-11 12:52 UTC] felipe@php.net
Well, could you clarify what is wrong here? Just give us the pattern which works differently from pcretest.
 [2014-05-11 12:53 UTC] felipe@php.net
-Status: Open +Status: Feedback
 [2014-05-12 12:48 UTC] david dot a dot schmitt at verizon dot com
-Status: Feedback +Status: Open
 [2014-05-12 12:48 UTC] david dot a dot schmitt at verizon dot com
One example pattern that works differently is the POSSESSIVE pattern in the test script.  i.e.:  \A(?:[^"]++|"(?:[^"]*+|"")*+")++

(Obviously, you must surround that pattern with // for PHP, but not for pcregrep).

Use the following text string to observe the difference:

NON QUOTED "QUOT""ED" AFTER "NOT MATCHED

echo "$TEXT" | pcregrep -o -e "$PATTERN" will display:

NON QUOTED "QUOT""ED" AFTER 

but after preg_match($pattern, $text, $matches) $matches[0] only includes:

NON QUOTED 

I originally observed the problem with a longer, more complex pattern, so this must be a more general problem than the single example pattern I am providing here.
 [2014-05-12 12:56 UTC] david dot a dot schmitt at verizon dot com
-Status: Open +Status: Closed
 [2014-05-12 12:56 UTC] david dot a dot schmitt at verizon dot com
How odd.  I just tested with pcretest instead of pcregrep -o and got different results.  This now looks like a PCRE problem, not PHP, so I'm closing this bug.

Here's my test script for future reference:

#!/bin/sh

export PATTERN='\A(?:[^"]++|"(?:[^"]*+|"")*+")++'
export TEXT='NON QUOTED "QUOT""ED" AFTER "NOT MATCHED'

cat <<EOF |
/$PATTERN/
$TEXT
EOF
pcretest

echo "$TEXT" | pcregrep -o -e "$PATTERN"
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 19 16:01:27 2024 UTC