|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2008-08-18 11:47 UTC] thunder013 at yopmail dot com
[2008-08-18 22:09 UTC] felipe@php.net
[2008-08-20 00:52 UTC] jani@php.net
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Thu Oct 30 01:00:01 2025 UTC |
Description: ------------ I want to use preg_split with the u modifier to split a UTF-8 string by each character, like that: preg_split('//u', $txt, -1, PREG_SPLIT_NO_EMPTY); It works fine, however, there is a bug when the string contains the sequence "\n\t" (0x0A09 in hex): the two characters are NOT splitted (see the example attached). Note that this bug isn't present when preg_split is used whithout the u modifier. This bug was reported previously here for version 5.2.4: http://bugs.php.net/bug.php?id=42737, but netherless is *still* present in version 5.2.5 and 5.2.6! Reproduce code: --------------- <? $txt = "abc\n\txyz!"; $tab = preg_split('//u', $txt, -1, PREG_SPLIT_NO_EMPTY); print_r($tab); echo '$tab[3]: len = ', strlen($tab[3]), ', hex = ', bin2hex($tab[3]), "\n"; ?> Expected result: ---------------- Array ( [0] => a [1] => b [2] => c [3] => [4] => [5] => x [6] => y [7] => z [8] => ! ) $tab[3]: len = 1, hex = 0a Actual result: -------------- $ php test.php Array ( [0] => a [1] => b [2] => c [3] => [4] => x [5] => y [6] => z [7] => ! ) $tab[3]: len = 2, hex = 0a09