php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #68310 preg_split() fail with extended modifier
Submitted: 2014-10-27 13:18 UTC Modified: 2014-10-27 15:31 UTC
From: david dot proweb at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.5.18 OS: Irrelevant
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: david dot proweb at gmail dot com
New email:
PHP Version: OS:

 

 [2014-10-27 13:18 UTC] david dot proweb at gmail dot com
Description:
------------
preg_split(pattern, subject) will fail if I use a pattern with extended (/x) modifier with escaped sequences (like \r or \n), but if I double escape it, it'll work. Seems a bug for me.

If I do: /\r?\n/    over "abc\r\n123" it will split to ["abc", "123"].
If I do: /\r?\n/x   over "abc\r\n123" it will will fail¹.
If I do: /\\r?\\n/x over "abc\r\n123" it will split to ["abc", "123"].

If you repair, I need to escape twice when I use /x modifier. It doesn't occur if I use preg_match(pattern, subject).

PHP 4.x it'll works correctly.

¹ Compilation failed: nothing to repeat at offset 1.

Test script:
---------------
$split_normal             = "/\r?\n/";
$split_extended_escaped   = "/\r?\n/x";
$split_extended_reescaped = "/\\r?\\n/x";
$split_text               = "abc\r\n123";

var_dump(preg_split($split_normal,             $split_text)); // OK
var_dump(preg_split($split_extended_escaped,   $split_text)); // Wrongly FAIL
var_dump(preg_split($split_extended_reescaped, $split_text)); // Wrongly OK


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-10-27 15:02 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2014-10-27 15:02 UTC] nikic@php.net
Double-quoted string interpret escape sequences, so what PCRE actually sees are the strings like
"/
?
/" (where the first newline is supposed to be an \r and the second one an \r.

So you can either escape the backslash (as you did) or use single-quoted strings like '/\r?\n/'.
 [2014-10-27 15:31 UTC] david dot proweb at gmail dot com
Relly, Nikic, you are right.
I confused with \w and \s, but both not is converted in double-quoted string and, how it is not occur, preg_split can use it as it is.

  \s => \s
  \w => \w
  \r => return feed (that is ignored by /x modifier)
  \n => new line    (that is ignored by /x modifier)

What I can do for this case is:

 - Option 1: convert to single-quoted string;
 - Option 2: double-escape this characters;
 - Option 3: encapsulate characters in class like [\r] and [\n];

I reported this problem because in PHP 4.x, the problem does not occur. But I can imagine that the /x modifier has changed to become more logical, as you said.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jul 03 12:01:33 2025 UTC