| 
        php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login | 
 PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits             
             [2008-08-18 11:47 UTC] thunder013 at yopmail dot com
  [2008-08-18 22:09 UTC] felipe@php.net
  [2008-08-20 00:52 UTC] jani@php.net
  | 
    |||||||||||||||||||||||||||
            
                 
                Copyright © 2001-2025 The PHP GroupAll rights reserved.  | 
        Last updated: Tue Nov 04 09:00:01 2025 UTC | 
Description: ------------ I want to use preg_split with the u modifier to split a UTF-8 string by each character, like that: preg_split('//u', $txt, -1, PREG_SPLIT_NO_EMPTY); It works fine, however, there is a bug when the string contains the sequence "\n\t" (0x0A09 in hex): the two characters are NOT splitted (see the example attached). Note that this bug isn't present when preg_split is used whithout the u modifier. This bug was reported previously here for version 5.2.4: http://bugs.php.net/bug.php?id=42737, but netherless is *still* present in version 5.2.5 and 5.2.6! Reproduce code: --------------- <? $txt = "abc\n\txyz!"; $tab = preg_split('//u', $txt, -1, PREG_SPLIT_NO_EMPTY); print_r($tab); echo '$tab[3]: len = ', strlen($tab[3]), ', hex = ', bin2hex($tab[3]), "\n"; ?> Expected result: ---------------- Array ( [0] => a [1] => b [2] => c [3] => [4] => [5] => x [6] => y [7] => z [8] => ! ) $tab[3]: len = 1, hex = 0a Actual result: -------------- $ php test.php Array ( [0] => a [1] => b [2] => c [3] => [4] => x [5] => y [6] => z [7] => ! ) $tab[3]: len = 2, hex = 0a09