php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #54223 enhance / change newline behavior
Submitted: 2011-03-11 11:23 UTC Modified: 2013-02-17 18:25 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: carsten_sttgt at gmx dot de Assigned:
Status: Wont fix Package: PCRE related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: carsten_sttgt at gmx dot de
New email:
PHP Version: OS:

 

 [2011-03-11 11:23 UTC] carsten_sttgt at gmx dot de
Description:
------------
At the moment (PHP bundled) PCRE is build with the default "#define NEWLINE 10".

As a result "$" means "\n" but not e.g. "\r\n" and "\r" remains as part of the match. This is unexpected for some people, especially on Windows. See the example below.

There are 3 solutions:
1)
Building PCRE with "#define NEWLINE -2", or with "#define NEWLINE -1" (because PCRE is still build with Unicode support)

2)
Adding a INI option like "pcre.newline=any"

3)
Making PCRE_NEWLINE_ANY, PCRE_NEWLINE_CR, PCRE_NEWLINE_CRLF, PCRE_NEWLINE_LF available to the userland (maybe as pattern modifier), like you can do this with C using the PCRE library.  

(Well, 1) is not essential if 2) and 3) is available)


Test script:
---------------
<?php
$str = "line1\r\nline2\r\nline3\r\n";

preg_match_all('/.+/', $str, $res);

var_dump($res);
?>


Expected result:
----------------
array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(5) "line1"
    [1]=>
    string(5) "line2"
    [2]=>
    string(5) "line3"
  }
}


Actual result:
--------------
array(1) {
  [0]=>
  array(3) {
    [0]=>
"   string(6) "line1
    [1]=>
"   string(6) "line2
    [2]=>
"   string(6) "line3
  }
}


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-12-24 07:05 UTC] danielklein at airpost dot net
This is very old, well defined behaviour.

/$/ matches /(?=\n?(?![\w\W]))/, that is a just before a newline with no remaining characters or end-of-string (no more characters). What you are asking for is /(?=\r?\n?(?![\w\W]))/ (or /$/m => /(?=\r\n?|\n)/) unless you're asking for ONLY \r\n to match! This would also have to change the meaning of /./ (without /s) from /[^\n]/ to /[^\r\n]/. This would be confusing for people used to the old way of doing things. Would this mean that /^/m would match just after a \r? (but not in between a \r and a \n???)

It would be better to specify what you are matching or not matching rather than changing the behaviour of carriage returns. I agree this can make regexes more difficult to write and understand in some instances but I would vote to stay with the status quo as I believe adding such an option would add to the total confusion rather than decrease it. If you write /[^\r\n]+/ it is perfectly clear to anyone who understands character classes. Note that the difference between /.+/ and /.+/s is already subtle and easily missed.

Another alternative is str_replace("\r\n", "\n", $input) before using the regex which will convert Windows style line endings to Unix style line endings.
 [2013-02-17 18:25 UTC] nikic@php.net
-Status: Open +Status: Wont fix
 [2013-02-17 18:25 UTC] nikic@php.net
These options are already available directly in the regular expression. E.g. you can prepend it with (*CRLF) or (*ANYCRLF) or even (*ANY) depending on what you want.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Tue May 13 12:01:27 2025 UTC