php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #44463 preg_match_all makes extra matches on empty alternatives
Submitted: 2008-03-17 22:13 UTC Modified: 2008-03-17 23:25 UTC
From: php at sameprecision dot org Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.2.5 OS: irrelevant
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: php at sameprecision dot org
New email:
PHP Version: OS:

 

 [2008-03-17 22:13 UTC] php at sameprecision dot org
Description:
------------
Performing a global match with empty alternative in subpattern followed by $ alternative in subpattern results in "extra" match.

In the example below, I believe that preg_match_all should match the second 'a' in the first subpattern, match the end of subject assertion in the second subpattern and flag the global search as over.

I've tried this with php5.2.5/pcre7.3 on linux.

Reproduce code:
---------------
$regex = '/(a|)(,|$)/';

echo preg_match_all($regex, 'a,', $m1, PREG_SET_ORDER)."\n";
print_r($m1);
echo preg_match_all($regex, 'a,a' ,$m2, PREG_SET_ORDER)."\n";
print_r($m2);

Expected result:
----------------
2
Array
(
    [0] => Array
        (
            [0] => a,
            [1] => a
            [2] => ,
        )

    [1] => Array
        (
            [0] => 
            [1] => 
            [2] => 
        )

)
2
Array
(
    [0] => Array
        (
            [0] => a,
            [1] => a
            [2] => ,
        )

    [1] => Array
        (
            [0] => a
            [1] => a
            [2] => 
        )

)



Actual result:
--------------
2
Array
(
    [0] => Array
        (
            [0] => a,
            [1] => a
            [2] => ,
        )

    [1] => Array
        (
            [0] => 
            [1] => 
            [2] => 
        )

)
3
Array
(
    [0] => Array
        (
            [0] => a,
            [1] => a
            [2] => ,
        )

    [1] => Array
        (
            [0] => a
            [1] => a
            [2] => 
        )

    [2] => Array
        (
            [0] => 
            [1] => 
            [2] => 
        )

)


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-03-17 22:41 UTC] felipe@php.net
This result is expected.

Using pcretest:
PCRE version 7.5-RC2 2007-12-27

  re> /(a|)(,|$)/g
data> a,
 0: a,
 1: a
 2: ,
 0: 
 1: 
 2: 
data> a,a
 0: a,
 1: a
 2: ,
 0: a
 1: a
 2: 
 0: 
 1: 
 2: 

Thanks.
 [2008-03-17 22:49 UTC] php at sameprecision dot org
Why would you expect three matches for a,a ?
 [2008-03-17 23:19 UTC] felipe@php.net
Well, at least isn't a PHP bug.
However, seems to me that the $ is matching twice...

File a bug in the PCRE bug repository:
http://bugs.exim.org/enter_bug.cgi?product=PCRE
 [2008-03-17 23:25 UTC] php at sameprecision dot org
I am querying PCRE developers about it...
A global search SHOULD end if the last match matched a end of subject assertion...
What is really happening is that at the end of the subject string, a global search performs another test--if the regex matches empty string, it produces this additional match... That is ambiguous.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun May 19 02:01:35 2024 UTC