|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2008-03-12 19:39 UTC] nlopess@php.net
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Oct 25 15:00:01 2025 UTC |
Description: ------------ $split = preg_split('#(\s)#', $value, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE ); make wrong spliting sentences on words when sentence at russian UTF-8 and begin with russian letter 'Р' (hex D0h A0h). For example russian "Расширенные поля пользователей" splits by php 5.2.5 on 7(!) words, but php4 is split correctly on 5 words. I think the problem at russian letter letter 'Р' wich split as single word. Reproduce code: --------------- <? $value="Расширенные поля пользователей"; header('Content-type: text/html; charset=utf-8'); print_r($value."<BR><BR><BR>"); $split = preg_split('#(\s)#', $value, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE ); print_r($split); ?> Expected result: ---------------- Array ( [0] => Расширенные [1] => [2] => поля [3] => [4] => пользователей ) Actual result: -------------- Array ( [0] => Р [1] => [2] => асширенные [3] => [4] => поля [5] => [6] => пользователей )