|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #54223 enhance / change newline behavior
Submitted: 2011-03-11 11:23 UTC Modified: 2013-02-17 18:25 UTC
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: carsten_sttgt at gmx dot de Assigned:
Status: Wont fix Package: PCRE related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: carsten_sttgt at gmx dot de
New email:
PHP Version: OS:


 [2011-03-11 11:23 UTC] carsten_sttgt at gmx dot de
At the moment (PHP bundled) PCRE is build with the default "#define NEWLINE 10".

As a result "$" means "\n" but not e.g. "\r\n" and "\r" remains as part of the match. This is unexpected for some people, especially on Windows. See the example below.

There are 3 solutions:
Building PCRE with "#define NEWLINE -2", or with "#define NEWLINE -1" (because PCRE is still build with Unicode support)

Adding a INI option like "pcre.newline=any"

Making PCRE_NEWLINE_ANY, PCRE_NEWLINE_CR, PCRE_NEWLINE_CRLF, PCRE_NEWLINE_LF available to the userland (maybe as pattern modifier), like you can do this with C using the PCRE library.  

(Well, 1) is not essential if 2) and 3) is available)

Test script:
$str = "line1\r\nline2\r\nline3\r\n";

preg_match_all('/.+/', $str, $res);


Expected result:
array(1) {
  array(3) {
    string(5) "line1"
    string(5) "line2"
    string(5) "line3"

Actual result:
array(1) {
  array(3) {
"   string(6) "line1
"   string(6) "line2
"   string(6) "line3


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2012-12-24 07:05 UTC] danielklein at airpost dot net
This is very old, well defined behaviour.

/$/ matches /(?=\n?(?![\w\W]))/, that is a just before a newline with no remaining characters or end-of-string (no more characters). What you are asking for is /(?=\r?\n?(?![\w\W]))/ (or /$/m => /(?=\r\n?|\n)/) unless you're asking for ONLY \r\n to match! This would also have to change the meaning of /./ (without /s) from /[^\n]/ to /[^\r\n]/. This would be confusing for people used to the old way of doing things. Would this mean that /^/m would match just after a \r? (but not in between a \r and a \n???)

It would be better to specify what you are matching or not matching rather than changing the behaviour of carriage returns. I agree this can make regexes more difficult to write and understand in some instances but I would vote to stay with the status quo as I believe adding such an option would add to the total confusion rather than decrease it. If you write /[^\r\n]+/ it is perfectly clear to anyone who understands character classes. Note that the difference between /.+/ and /.+/s is already subtle and easily missed.

Another alternative is str_replace("\r\n", "\n", $input) before using the regex which will convert Windows style line endings to Unix style line endings.
 [2013-02-17 18:25 UTC]
-Status: Open +Status: Wont fix
 [2013-02-17 18:25 UTC]
These options are already available directly in the regular expression. E.g. you can prepend it with (*CRLF) or (*ANYCRLF) or even (*ANY) depending on what you want.
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Sun Aug 01 23:01:24 2021 UTC