go to bug id or search bugs for
At the moment (PHP bundled) PCRE is build with the default "#define NEWLINE 10".
As a result "$" means "\n" but not e.g. "\r\n" and "\r" remains as part of the match. This is unexpected for some people, especially on Windows. See the example below.
There are 3 solutions:
Building PCRE with "#define NEWLINE -2", or with "#define NEWLINE -1" (because PCRE is still build with Unicode support)
Adding a INI option like "pcre.newline=any"
Making PCRE_NEWLINE_ANY, PCRE_NEWLINE_CR, PCRE_NEWLINE_CRLF, PCRE_NEWLINE_LF available to the userland (maybe as pattern modifier), like you can do this with C using the PCRE library.
(Well, 1) is not essential if 2) and 3) is available)
$str = "line1\r\nline2\r\nline3\r\n";
preg_match_all('/.+/', $str, $res);
" string(6) "line1
" string(6) "line2
" string(6) "line3
Add a Patch
Add a Pull Request
This is very old, well defined behaviour.
/$/ matches /(?=\n?(?![\w\W]))/, that is a just before a newline with no remaining characters or end-of-string (no more characters). What you are asking for is /(?=\r?\n?(?![\w\W]))/ (or /$/m => /(?=\r\n?|\n)/) unless you're asking for ONLY \r\n to match! This would also have to change the meaning of /./ (without /s) from /[^\n]/ to /[^\r\n]/. This would be confusing for people used to the old way of doing things. Would this mean that /^/m would match just after a \r? (but not in between a \r and a \n???)
It would be better to specify what you are matching or not matching rather than changing the behaviour of carriage returns. I agree this can make regexes more difficult to write and understand in some instances but I would vote to stay with the status quo as I believe adding such an option would add to the total confusion rather than decrease it. If you write /[^\r\n]+/ it is perfectly clear to anyone who understands character classes. Note that the difference between /.+/ and /.+/s is already subtle and easily missed.
Another alternative is str_replace("\r\n", "\n", $input) before using the regex which will convert Windows style line endings to Unix style line endings.
These options are already available directly in the regular expression. E.g. you can prepend it with (*CRLF) or (*ANYCRLF) or even (*ANY) depending on what you want.