php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #68310 preg_split() fail with extended modifier
Submitted: 2014-10-27 13:18 UTC Modified: 2014-10-27 15:31 UTC
From: david dot proweb at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.5.18 OS: Irrelevant
Private report: No CVE-ID: None
 [2014-10-27 13:18 UTC] david dot proweb at gmail dot com
Description:
------------
preg_split(pattern, subject) will fail if I use a pattern with extended (/x) modifier with escaped sequences (like \r or \n), but if I double escape it, it'll work. Seems a bug for me.

If I do: /\r?\n/    over "abc\r\n123" it will split to ["abc", "123"].
If I do: /\r?\n/x   over "abc\r\n123" it will will fail¹.
If I do: /\\r?\\n/x over "abc\r\n123" it will split to ["abc", "123"].

If you repair, I need to escape twice when I use /x modifier. It doesn't occur if I use preg_match(pattern, subject).

PHP 4.x it'll works correctly.

¹ Compilation failed: nothing to repeat at offset 1.

Test script:
---------------
$split_normal             = "/\r?\n/";
$split_extended_escaped   = "/\r?\n/x";
$split_extended_reescaped = "/\\r?\\n/x";
$split_text               = "abc\r\n123";

var_dump(preg_split($split_normal,             $split_text)); // OK
var_dump(preg_split($split_extended_escaped,   $split_text)); // Wrongly FAIL
var_dump(preg_split($split_extended_reescaped, $split_text)); // Wrongly OK


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-10-27 15:02 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2014-10-27 15:02 UTC] nikic@php.net
Double-quoted string interpret escape sequences, so what PCRE actually sees are the strings like
"/
?
/" (where the first newline is supposed to be an \r and the second one an \r.

So you can either escape the backslash (as you did) or use single-quoted strings like '/\r?\n/'.
 [2014-10-27 15:31 UTC] david dot proweb at gmail dot com
Relly, Nikic, you are right.
I confused with \w and \s, but both not is converted in double-quoted string and, how it is not occur, preg_split can use it as it is.

  \s => \s
  \w => \w
  \r => return feed (that is ignored by /x modifier)
  \n => new line    (that is ignored by /x modifier)

What I can do for this case is:

 - Option 1: convert to single-quoted string;
 - Option 2: double-escape this characters;
 - Option 3: encapsulate characters in class like [\r] and [\n];

I reported this problem because in PHP 4.x, the problem does not occur. But I can imagine that the /x modifier has changed to become more logical, as you said.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 18 17:01:28 2024 UTC