php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60380 preg_split and old version split
Submitted: 2011-11-25 07:15 UTC Modified: 2012-08-31 05:57 UTC
Votes:1
Avg. Score:4.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: evrinoma at gmail dot com Assigned:
Status: Closed Package: *Programming Data Structures
PHP Version: 5.3.8 OS: Linux CenOS5 CentOS6
Private report: No CVE-ID: None
 [2011-11-25 07:15 UTC] evrinoma at gmail dot com
Description:
------------
--splite------- 
 array[0]=0123456789-- 
 array[1]=qwertyuiop-- 
 array[2]=9876543210-- 
 array[3]=poiuytrewq-- 
 array[4]=0123456789-- 
 array[5]=asdfghjkl-- 
 --preg_splite-- 
 array[0]=0123456789\qwertyuiop-- <<<<<<<<< ???
 array[1]=9876543210-- 
 array[2]=poiuytrewq-- 
 array[3]=0123456789-- 
 array[4]=asdfghjkl-- 
 array[5]=-- 
 ---------------

Test script:
---------------
<HTML>
<HEAD>
<TITLE> BUG </TITLE>
</HEAD>
<BODY>
<? 
printf(" --splite------- <BR>\n");

$date="0123456789\\qwertyuiop 9876543210.poiuytrewq:0123456789 asdfghjkl";
$var = split( "[-\\. :]",$date );

printf(" array[0]=%s-- <BR>\n",$var[0]);
printf(" array[1]=%s-- <BR>\n",$var[1]);
printf(" array[2]=%s-- <BR>\n",$var[2]);
printf(" array[3]=%s-- <BR>\n",$var[3]);
printf(" array[4]=%s-- <BR>\n",$var[4]);
printf(" array[5]=%s-- <BR>\n",$var[5]);

printf(" --preg_splite-- <BR>\n");

$var = preg_split( "#[-\\. :]#",$date);

printf(" array[0]=%s-- <BR>\n",$var[0]);
printf(" array[1]=%s-- <BR>\n",$var[1]);
printf(" array[2]=%s-- <BR>\n",$var[2]);
printf(" array[3]=%s-- <BR>\n",$var[3]);
printf(" array[4]=%s-- <BR>\n",$var[4]);
printf(" array[5]=%s-- <BR>\n",$var[5]);
printf(" --------------- <BR>\n");

?>
</BODY>
</HTML>



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-08-30 19:02 UTC] cjbf at cam dot ac dot uk
Put simply, preg_split treats "\\" wrongly. It ought to search for an actual "\" but never matches it.
 [2012-08-30 19:24 UTC] cjbf at cam dot ac dot uk
Three \s are needed to match one \. Could that at least be documented if it's not wrong?
$text='AA\\nB and B\r';// contains \n and \r
print_r(preg_split('/\\\n|\\\r/', $text));
gives
Array
(
    [0] => AA
    [1] => B and B
    [2] => 
)
 [2012-08-30 19:55 UTC] nikic@php.net
You are dealing with two levels of unescaping here. The first is the usual for string, the second is additionally done by PCRE.

E.g. if you write "\\\\" the actually resulting string will just be "\\" (because \\ is unescaped to \). This is then passed to PCRE, which itself again does unescaping, so the \\ is turned into \. That's why you need four backslashes to produce one literal backslash in a regular expression.

In your particular case three work too, but only because the following character does not form an escape sequence.
 [2012-08-31 05:57 UTC] evrinoma at gmail dot com
Many thanks for help
 [2012-08-31 05:57 UTC] evrinoma at gmail dot com
-Status: Open +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Oct 31 23:01:28 2024 UTC