php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45850 preg_split don't split correctly the sequence "\n\t" with the u modifier
Submitted: 2008-08-18 11:29 UTC Modified: 2008-08-20 00:52 UTC
From: thunder013 at yopmail dot com Assigned:
Status: Closed Package: PCRE related
PHP Version: 5.2.6 OS: *
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
40 - 17 = ?
Subscribe to this entry?

 
 [2008-08-18 11:29 UTC] thunder013 at yopmail dot com
Description:
------------
I want to use preg_split with the u modifier to split a UTF-8 string by each character, like that: preg_split('//u', $txt, -1, PREG_SPLIT_NO_EMPTY);

It works fine, however, there is a bug when the string contains the sequence "\n\t" (0x0A09 in hex): the two characters are NOT splitted (see the example attached).

Note that this bug isn't present when preg_split is used whithout the u modifier.

This bug was reported previously here for version 5.2.4: http://bugs.php.net/bug.php?id=42737, but netherless is *still* present in version 5.2.5 and 5.2.6!

Reproduce code:
---------------
<?

        $txt = "abc\n\txyz!";
        $tab = preg_split('//u', $txt, -1, PREG_SPLIT_NO_EMPTY);
        print_r($tab);
        echo '$tab[3]: len = ', strlen($tab[3]), ', hex = ', bin2hex($tab[3]), "\n";
?>


Expected result:
----------------
Array
(
    [0] => a
    [1] => b
    [2] => c
    [3] => 

    [4] => 	
    [5] => x
    [6] => y
    [7] => z
    [8] => !
)
$tab[3]: len = 1, hex = 0a


Actual result:
--------------
$ php test.php
Array
(
    [0] => a
    [1] => b
    [2] => c
    [3] => 
	
    [4] => x
    [5] => y
    [6] => z
    [7] => !
)
$tab[3]: len = 2, hex = 0a09

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-08-18 11:47 UTC] thunder013 at yopmail dot com
Oh... In fact the bug #42737 that I link concerns the sequence "\n\r", not "\n\t"...
 [2008-08-18 22:09 UTC] felipe@php.net
The #42737 was fixed in 5_3 and HEAD.
Thus using your example code in the 5.3 I got the expected result.
 [2008-08-20 00:52 UTC] jani@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

Merged the VERY simple fix to PHP_5_2. 
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Thu Oct 21 11:03:34 2021 UTC