php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #61054 preg_match () matching end of line in text file of windows style make a mistake
Submitted: 2012-02-11 09:42 UTC Modified: 2012-02-15 05:32 UTC
Votes:1
Avg. Score:1.0 ± 0.0
Reproduced:0 of 1 (0.0%)
From: gzpan123 at gmail dot com Assigned:
Status: Not a bug Package: PCRE related
PHP Version: 5.3.10 OS: Windows,linux all
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: gzpan123 at gmail dot com
New email:
PHP Version: OS:

 

 [2012-02-11 09:42 UTC] gzpan123 at gmail dot com
Description:
------------
function:preg_ match()
when processing text file of windows style like blow example,it get wrong
the $matches[2] behind follow a character <CR>,
----
Create a text file(test.txt) in windows,Content:
hello guo
hi jason
test


Test script:
---------------
<?php
$file = fopen("test.txt","r");
while(!feof($file)){
	$line = fgets($file);
	preg_match('/^(.+) (.+)$/',$line,$matches);
	print_r($matches);
	echo $line;
	echo $matches[1].$matches[2];
}
fclose($file);
?>


Expected result:
----------------
Array
(
    [0] => hello guo
    [1] => hello
    [2] => guo
)
hello guo
helloguo
Array
(
    [0] => hi jason
    [1] => hi
    [2] => jason
)
hi jason
hijason
Array
(
)
test

Actual result:
--------------
hello guo
Array
(
    [0] => hello guo
    [1] => hello
    [2] => guo
)
hi jason
Array
(
    [0] => hi jason
    [1] => hi
    [2] => jason
)
testArray
(
)
test

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-02-11 09:47 UTC] gzpan123 at gmail dot com
-Operating System: Windows XP,Ubuntu 11.10 +Operating System: Windows,linux all
 [2012-02-11 09:47 UTC] gzpan123 at gmail dot com
windows,linux all meet this bug
 [2012-02-12 12:42 UTC] nikic@php.net
I don't really understand the issue you have from your example, but in general LF is the compile-default linebreak for PCRE. If you want to use a different linebreak character you can either compile PCRE with the appropriate option or specify a control option like (*CRLF) before your regular expression.
 [2012-02-15 01:06 UTC] aharvey@php.net
-Status: Open +Status: Feedback -Package: *Regular Expressions +Package: PCRE related
 [2012-02-15 01:06 UTC] aharvey@php.net
I don't understand the issue either. The code behaves the same way for me 
regardless of the line ending in test.txt.

Can you provide a simpler example, please?
 [2012-02-15 04:44 UTC] gzpan123 at gmail dot com
Test script:
---------------
<?php
$file = fopen("test.txt","r");
while(!feof($file)){
	$line = fgets($file);
	preg_match('/^(.+) (.+)$/',$line,$matches);
	echo $line;	
	if(array_key_exists(1, $matches) && array_key_exists(2, $matches))
	echo $matches[1].$matches[2];
}
fclose($file);
?>
test.txt must be created in windows
abc def
ghi jklm
nop

Expected result:
----------
abc def
abcdef
ghi jklm
ghijklm
nop

Actual result:
----------
abc def
ghi jklm
nopjklm
 [2012-02-15 04:44 UTC] gzpan123 at gmail dot com
-Status: Feedback +Status: Open
 [2012-02-15 04:48 UTC] gzpan123 at gmail dot com
$matches[2] follow character <CR>
 [2012-02-15 05:19 UTC] rasmus@php.net
-Status: Open +Status: Not a bug
 [2012-02-15 05:19 UTC] rasmus@php.net
Ok. here is my guess at what is confusing you.

By default PCRE uses \n to indicate the end of a line. Your files have \r\n at 
the end, so I think the only confusion you have is that your second match will 
have \r included in it and when you then print this on the console it looks 
essentially invisible.

In your latest example there, try changing your output line to this:

echo $matches[1].trim($matches[2]);

Does it make more sense now?
 [2012-02-15 05:32 UTC] gzpan123 at gmail dot com
Yes,thanks.That is it
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 06:01:30 2025 UTC