php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #41496 Preg_Split returns 3 matches with offset -1
Submitted: 2007-05-25 09:57 UTC Modified: 2007-05-28 15:53 UTC
From: m_v_ at gmx dot net Assigned: nlopess (profile)
Status: Not a bug Package: PCRE related
PHP Version: 5CVS-2007-05-25 (snap) OS: Windows Vista Home Premium
Private report: No CVE-ID: None
 [2007-05-25 09:57 UTC] m_v_ at gmx dot net
Description:
------------
preg_split behaves unexpectet with certain pattern (see reproduce code):

It returns tripple offsets with the offsetvalue -1.

It gets even worse when you replace:

 $reg = "~\{(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s";
with
 $reg = "~\{\*(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s";

it simply returns several off those empty string, offset -1 triples.


P.S.: Hope it is actually a bug and not my ignorance of regex, however i asked in a forum if anybody knew where the mistake is and i got no answer and in addition the offset -1 mykes it look like a bug to me.
If it however should be a mistake in my regex it would be kind if you would let me know how to put it.

Thx in advance

Reproduce code:
---------------
$reg = "~\{(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s";
echo $reg.'<br>';
$res = preg_split( $reg, "df{g}s{literal}asd{as}d{/literal}", -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_OFFSET_CAPTURE );
echo '<pre>'.print_r($res, 1).'</pre>';

Expected result:
----------------
~\{(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s

Array
(
    [0] => Array
        (
            [0] => df
            [1] => 0
        )

    [1] => Array
        (
            [0] => g
            [1] => 3
        )

    [2] => Array
        (
            [0] => s
            [1] => 5
        )

    [3] => Array
        (
            [0] => literal
            [1] => 7
        )

    [4] => Array
        (
            [0] => asd{as}d
            [1] => 15
        )

    [5] => Array
        (
            [0] => /literal
            [1] => 24
        )

    [6] => Array
        (
            [0] => 
            [1] => 33
        )

)


Actual result:
--------------
~\{(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s

Array
(
    [0] => Array
        (
            [0] => df
            [1] => 0
        )

    [1] => Array
        (
            [0] => 
            [1] => -1
        )

    [2] => Array
        (
            [0] => 
            [1] => -1
        )

    [3] => Array
        (
            [0] => 
            [1] => -1
        )

    [4] => Array
        (
            [0] => g
            [1] => 3
        )

    [5] => Array
        (
            [0] => s
            [1] => 5
        )

    [6] => Array
        (
            [0] => literal
            [1] => 7
        )

    [7] => Array
        (
            [0] => asd{as}d
            [1] => 15
        )

    [8] => Array
        (
            [0] => /literal
            [1] => 24
        )

    [9] => Array
        (
            [0] => 
            [1] => 33
        )

)


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-05-28 15:53 UTC] nlopess@php.net
in this case we only use the information that is passed from PCRE. The extra entries you are seeing appear because of the backtracking algorithms (used in current regex engines).
You can always skip those results by specifying the PREG_SPLIT_NO_EMPTY flag.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 22:01:29 2024 UTC