php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45850 preg_split don't split correctly the sequence "\n\t" with the u modifier
Submitted: 2008-08-18 11:29 UTC Modified: 2008-08-20 00:52 UTC
From: thunder013 at yopmail dot com Assigned:
Status: Closed Package: PCRE related
PHP Version: 5.2.6 OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: thunder013 at yopmail dot com
New email:
PHP Version: OS:

 

 [2008-08-18 11:29 UTC] thunder013 at yopmail dot com
Description:
------------
I want to use preg_split with the u modifier to split a UTF-8 string by each character, like that: preg_split('//u', $txt, -1, PREG_SPLIT_NO_EMPTY);

It works fine, however, there is a bug when the string contains the sequence "\n\t" (0x0A09 in hex): the two characters are NOT splitted (see the example attached).

Note that this bug isn't present when preg_split is used whithout the u modifier.

This bug was reported previously here for version 5.2.4: http://bugs.php.net/bug.php?id=42737, but netherless is *still* present in version 5.2.5 and 5.2.6!

Reproduce code:
---------------
<?

        $txt = "abc\n\txyz!";
        $tab = preg_split('//u', $txt, -1, PREG_SPLIT_NO_EMPTY);
        print_r($tab);
        echo '$tab[3]: len = ', strlen($tab[3]), ', hex = ', bin2hex($tab[3]), "\n";
?>


Expected result:
----------------
Array
(
    [0] => a
    [1] => b
    [2] => c
    [3] => 

    [4] => 	
    [5] => x
    [6] => y
    [7] => z
    [8] => !
)
$tab[3]: len = 1, hex = 0a


Actual result:
--------------
$ php test.php
Array
(
    [0] => a
    [1] => b
    [2] => c
    [3] => 
	
    [4] => x
    [5] => y
    [6] => z
    [7] => !
)
$tab[3]: len = 2, hex = 0a09

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-08-18 11:47 UTC] thunder013 at yopmail dot com
Oh... In fact the bug #42737 that I link concerns the sequence "\n\r", not "\n\t"...
 [2008-08-18 22:09 UTC] felipe@php.net
The #42737 was fixed in 5_3 and HEAD.
Thus using your example code in the 5.3 I got the expected result.
 [2008-08-20 00:52 UTC] jani@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

Merged the VERY simple fix to PHP_5_2. 
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 18:01:29 2024 UTC