|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2014-06-20 13:10 UTC] johannes@php.net
-Status: Open
+Status: Not a bug
[2014-06-20 13:10 UTC] johannes@php.net
[2014-06-20 13:28 UTC] test at test dot com
-: alexandre at abrioux dot fr
+: test at test dot com
[2014-06-20 13:28 UTC] test at test dot com
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Dec 21 17:00:01 2025 UTC |
Description: ------------ I quote : PREG_SPLIT_OFFSET_CAPTURE "[...] the return value [is] an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1." The "string offset" is wrong when the subject string is encoded in UTF-8. Maybe the function is using strlen() internally, instead of using mb_strlen() ? Note that I use the "u" modifier in the regex. Test script: --------------- <?php header("Content-Type: text/plain; charset=utf-8"); var_dump(preg_split('# #u', 'à é ù', 0, PREG_SPLIT_OFFSET_CAPTURE)); ?> Actual result: -------------- array(3) { [0]=> array(2) { [0]=> string(2) "à" [1]=> int(0) } [1]=> array(2) { [0]=> string(2) "é" [1]=> int(3) } [2]=> array(2) { [0]=> string(2) "ù" [1]=> int(6) } }