php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #41496 Preg_Split returns 3 matches with offset -1
Submitted: 2007-05-25 09:57 UTC Modified: 2007-05-28 15:53 UTC
From: m_v_ at gmx dot net Assigned: nlopess (profile)
Status: Not a bug Package: PCRE related
PHP Version: 5CVS-2007-05-25 (snap) OS: Windows Vista Home Premium
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
38 - 35 = ?
Subscribe to this entry?

 
 [2007-05-25 09:57 UTC] m_v_ at gmx dot net
Description:
------------
preg_split behaves unexpectet with certain pattern (see reproduce code):

It returns tripple offsets with the offsetvalue -1.

It gets even worse when you replace:

 $reg = "~\{(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s";
with
 $reg = "~\{\*(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s";

it simply returns several off those empty string, offset -1 triples.


P.S.: Hope it is actually a bug and not my ignorance of regex, however i asked in a forum if anybody knew where the mistake is and i got no answer and in addition the offset -1 mykes it look like a bug to me.
If it however should be a mistake in my regex it would be kind if you would let me know how to put it.

Thx in advance

Reproduce code:
---------------
$reg = "~\{(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s";
echo $reg.'<br>';
$res = preg_split( $reg, "df{g}s{literal}asd{as}d{/literal}", -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_OFFSET_CAPTURE );
echo '<pre>'.print_r($res, 1).'</pre>';

Expected result:
----------------
~\{(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s

Array
(
    [0] => Array
        (
            [0] => df
            [1] => 0
        )

    [1] => Array
        (
            [0] => g
            [1] => 3
        )

    [2] => Array
        (
            [0] => s
            [1] => 5
        )

    [3] => Array
        (
            [0] => literal
            [1] => 7
        )

    [4] => Array
        (
            [0] => asd{as}d
            [1] => 15
        )

    [5] => Array
        (
            [0] => /literal
            [1] => 24
        )

    [6] => Array
        (
            [0] => 
            [1] => 33
        )

)


Actual result:
--------------
~\{(literal)\}(.*?)\{(/literal)\}|\{(.*?)\}~s

Array
(
    [0] => Array
        (
            [0] => df
            [1] => 0
        )

    [1] => Array
        (
            [0] => 
            [1] => -1
        )

    [2] => Array
        (
            [0] => 
            [1] => -1
        )

    [3] => Array
        (
            [0] => 
            [1] => -1
        )

    [4] => Array
        (
            [0] => g
            [1] => 3
        )

    [5] => Array
        (
            [0] => s
            [1] => 5
        )

    [6] => Array
        (
            [0] => literal
            [1] => 7
        )

    [7] => Array
        (
            [0] => asd{as}d
            [1] => 15
        )

    [8] => Array
        (
            [0] => /literal
            [1] => 24
        )

    [9] => Array
        (
            [0] => 
            [1] => 33
        )

)


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-05-28 15:53 UTC] nlopess@php.net
in this case we only use the information that is passed from PCRE. The extra entries you are seeing appear because of the backtracking algorithms (used in current regex engines).
You can always skip those results by specifying the PREG_SPLIT_NO_EMPTY flag.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue May 07 15:01:36 2024 UTC