|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2016-08-19 16:18 UTC] cmb@php.net
-Status: Open
+Status: Analyzed
-Assigned To:
+Assigned To: cmb
[2016-08-19 16:18 UTC] cmb@php.net
[2016-08-19 17:06 UTC] cmb@php.net
-Summary: grapheme_*: ASCII optimisation is not Unicode
compliant on CR LF sequence
+Summary: grapheme_*() is not Unicode compliant on CR LF
sequence
[2016-08-20 01:22 UTC] cmb@php.net
[2016-08-20 01:22 UTC] cmb@php.net
-Status: Analyzed
+Status: Closed
[2016-10-17 10:09 UTC] bwoebi@php.net
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sun Oct 26 01:00:01 2025 UTC |
Description: ------------ ASCII optimisation of grapheme_* functions count CR + LF sequence as 2 distinct graphemes but should count for 1 as defined by Unicode standard. It may conduct to some strange results depending if the string contains a code point > 0x7F or not. Eg: grapheme_strlen("\r\na") != grapheme_strlen("\r\né") A workaround for now could be to append an invisible or whitespace character to the string, like: function my_grapheme_strlen($string) { return grapheme_strlen($string . "\xef\xbb\xbf") - 1; // append ZERO WIDTH SPACE (U+200B) } Test script: --------------- var_dump(grapheme_strlen("\r\n")); var_dump(grapheme_substr(implode("\r\n", ['abc', 'def', 'ghi']), 5)); Expected result: ---------------- int(1) string(7) "ef ghi" Actual result: -------------- int(2) string(8) "def ghi"