|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2011-03-11 11:23 UTC] carsten_sttgt at gmx dot de
Description:
------------
At the moment (PHP bundled) PCRE is build with the default "#define NEWLINE 10".
As a result "$" means "\n" but not e.g. "\r\n" and "\r" remains as part of the match. This is unexpected for some people, especially on Windows. See the example below.
There are 3 solutions:
1)
Building PCRE with "#define NEWLINE -2", or with "#define NEWLINE -1" (because PCRE is still build with Unicode support)
2)
Adding a INI option like "pcre.newline=any"
3)
Making PCRE_NEWLINE_ANY, PCRE_NEWLINE_CR, PCRE_NEWLINE_CRLF, PCRE_NEWLINE_LF available to the userland (maybe as pattern modifier), like you can do this with C using the PCRE library.
(Well, 1) is not essential if 2) and 3) is available)
Test script:
---------------
<?php
$str = "line1\r\nline2\r\nline3\r\n";
preg_match_all('/.+/', $str, $res);
var_dump($res);
?>
Expected result:
----------------
array(1) {
[0]=>
array(3) {
[0]=>
string(5) "line1"
[1]=>
string(5) "line2"
[2]=>
string(5) "line3"
}
}
Actual result:
--------------
array(1) {
[0]=>
array(3) {
[0]=>
" string(6) "line1
[1]=>
" string(6) "line2
[2]=>
" string(6) "line3
}
}
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Oct 26 19:00:01 2025 UTC |
This is very old, well defined behaviour. /$/ matches /(?=\n?(?![\w\W]))/, that is a just before a newline with no remaining characters or end-of-string (no more characters). What you are asking for is /(?=\r?\n?(?![\w\W]))/ (or /$/m => /(?=\r\n?|\n)/) unless you're asking for ONLY \r\n to match! This would also have to change the meaning of /./ (without /s) from /[^\n]/ to /[^\r\n]/. This would be confusing for people used to the old way of doing things. Would this mean that /^/m would match just after a \r? (but not in between a \r and a \n???) It would be better to specify what you are matching or not matching rather than changing the behaviour of carriage returns. I agree this can make regexes more difficult to write and understand in some instances but I would vote to stay with the status quo as I believe adding such an option would add to the total confusion rather than decrease it. If you write /[^\r\n]+/ it is perfectly clear to anyone who understands character classes. Note that the difference between /.+/ and /.+/s is already subtle and easily missed. Another alternative is str_replace("\r\n", "\n", $input) before using the regex which will convert Windows style line endings to Unix style line endings.