php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #54223 enhance / change newline behavior
Submitted: 2011-03-11 11:23 UTC Modified: 2013-02-17 18:25 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: carsten_sttgt at gmx dot de Assigned:
Status: Wont fix Package: PCRE related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: carsten_sttgt at gmx dot de
New email:
PHP Version: OS:

 

 [2011-03-11 11:23 UTC] carsten_sttgt at gmx dot de
Description:
------------
At the moment (PHP bundled) PCRE is build with the default "#define NEWLINE 10".

As a result "$" means "\n" but not e.g. "\r\n" and "\r" remains as part of the match. This is unexpected for some people, especially on Windows. See the example below.

There are 3 solutions:
1)
Building PCRE with "#define NEWLINE -2", or with "#define NEWLINE -1" (because PCRE is still build with Unicode support)

2)
Adding a INI option like "pcre.newline=any"

3)
Making PCRE_NEWLINE_ANY, PCRE_NEWLINE_CR, PCRE_NEWLINE_CRLF, PCRE_NEWLINE_LF available to the userland (maybe as pattern modifier), like you can do this with C using the PCRE library.  

(Well, 1) is not essential if 2) and 3) is available)


Test script:
---------------
<?php
$str = "line1\r\nline2\r\nline3\r\n";

preg_match_all('/.+/', $str, $res);

var_dump($res);
?>


Expected result:
----------------
array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(5) "line1"
    [1]=>
    string(5) "line2"
    [2]=>
    string(5) "line3"
  }
}


Actual result:
--------------
array(1) {
  [0]=>
  array(3) {
    [0]=>
"   string(6) "line1
    [1]=>
"   string(6) "line2
    [2]=>
"   string(6) "line3
  }
}


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-12-24 07:05 UTC] danielklein at airpost dot net
This is very old, well defined behaviour.

/$/ matches /(?=\n?(?![\w\W]))/, that is a just before a newline with no remaining characters or end-of-string (no more characters). What you are asking for is /(?=\r?\n?(?![\w\W]))/ (or /$/m => /(?=\r\n?|\n)/) unless you're asking for ONLY \r\n to match! This would also have to change the meaning of /./ (without /s) from /[^\n]/ to /[^\r\n]/. This would be confusing for people used to the old way of doing things. Would this mean that /^/m would match just after a \r? (but not in between a \r and a \n???)

It would be better to specify what you are matching or not matching rather than changing the behaviour of carriage returns. I agree this can make regexes more difficult to write and understand in some instances but I would vote to stay with the status quo as I believe adding such an option would add to the total confusion rather than decrease it. If you write /[^\r\n]+/ it is perfectly clear to anyone who understands character classes. Note that the difference between /.+/ and /.+/s is already subtle and easily missed.

Another alternative is str_replace("\r\n", "\n", $input) before using the regex which will convert Windows style line endings to Unix style line endings.
 [2013-02-17 18:25 UTC] nikic@php.net
-Status: Open +Status: Wont fix
 [2013-02-17 18:25 UTC] nikic@php.net
These options are already available directly in the regular expression. E.g. you can prepend it with (*CRLF) or (*ANYCRLF) or even (*ANY) depending on what you want.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 14:01:29 2024 UTC