|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2012-04-26 20:18 UTC] poinsot dot julien at gmail dot com
Description:
------------
I don't kwnow if we really can qualify this of bug: full case folding may result in wrong offsets calculation on the few code points which expand to more than 1 code points (up to 3). For example, "ß" is expanded to "ss": the length is not anymore the same, so grapheme_stri* functions may give wrong (user-expected) results.
A simple "workaround" could be a simple case folding, even if it is more limited.
Test script:
---------------
$haystack = 'Auf der Straße nach Paris habe ich mit dem Fahrer gesprochen';
var_dump(
grapheme_stristr($haystack, 'Paris '),
grapheme_substr($haystack, grapheme_stripos($haystack, 'Paris'))
);
Expected result:
----------------
string(40) "Paris habe ich mit dem Fahrer gesprochen"
string(40) "Paris habe ich mit dem Fahrer gesprochen"
Actual result:
--------------
string(39) "aris habe ich mit dem Fahrer gesprochen"
string(39) "aris habe ich mit dem Fahrer gesprochen"
Patchesgrapheme_util.c (last revision 2012-04-26 20:19 UTC by poinsot dot julien at gmail dot com)Pull Requests
Pull requests:
HistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Fri Oct 24 12:00:01 2025 UTC |
Btw, here is a mbstring based PHP implementation that should do the job correctly: function grapheme_stripos($s, $needle, $offset = 0) { if ($offset < 0) $offset = 0; if (!$needle = mb_stripos($s, $needle, $offset, 'UTF-8')) return $needle; return grapheme_strlen(mb_substr($s, 0, $needle, 'UTF-8')); } function grapheme_strripos($s, $needle, $offset = 0) { if ($offset < 0) $offset = 0; if (!$needle = mb_strripos($s, $needle, $offset, 'UTF-8')) return $needle; return grapheme_strlen(mb_substr($s, 0, $needle, 'UTF-8')); }