|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2014-11-19 05:38 UTC] masakielastic at gmail dot com
Description: ------------ grapheme_extract take an extra trailing character when string contains variation selectors supplement (U+E0100...U+E01EF). Variation selectors supplement is used for last names and place names in Japanese. https://en.wikipedia.org/Variation_Selectors_Supplement_(Unicode_block) Test data, Katsushika-ku ("葛\xF3\xA0\x84\x81飾区" - U+845B U+E0101 U+98FE U+533A) is a place name in Tokyo, Japan. https://en.wikipedia.org/wiki/Katsushika One of the famous companies in Katsushika-ku is Tomy Company, Ltd. https://en.wikipedia.org/Tomy You can see glyph variant for U+845B U+E0101 in the following site. http://ivd.dicey.org/17408 Test script: --------------- grapheme_extract("葛\xF3\xA0\x84\x81飾区", 1); Expected result: ---------------- "葛\xF3\xA0\x84\x81" Actual result: -------------- "葛\xF3\xA0\x84\x81飾" PatchesPull Requests
Pull requests:
HistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Tue Oct 28 21:00:01 2025 UTC |
PR has been added. This bug occurs on any > U+FFFF characters and returned value can include more than one extraneous characters. <?php $latin = 'AAABBBBB'; var_dump(mb_strlen(grapheme_extract($latin, 3))); // 3 $latin = 'AAABBBBB'; var_dump(mb_strlen(grapheme_extract($latin, 3))); // 3 $latin = '????????'; var_dump(mb_strlen(grapheme_extract($latin, 3))); // should be 3, got 6 $emoticon = '?'; var_dump(grapheme_extract($emoticon, 1, GRAPHEME_EXTR_MAXCHARS)); /* should be '?', got '' */ $emoticon = '??'; var_dump(grapheme_extract($emoticon, 4, GRAPHEME_EXTR_MAXBYTES)); /* should be '?', got '' */ $k1024 = '?????'; if (grapheme_strlen($k1024) == 3) { var_dump(grapheme_extract($k1024, 1)); /* should be '??', got '????' */ } else { echo "ICU doesn't support Unicode 7.0\n"; }