|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79308 Unexpected length of unicode string returned
Submitted: 2020-02-26 12:50 UTC Modified: 2020-02-26 14:53 UTC
From: dpiekarski at dompie dot de Assigned: cmb (profile)
Status: Not a bug Package: *Unicode Issues
PHP Version: 7.4.3 OS: Ubuntu 18.04.4 LTS
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: dpiekarski at dompie dot de
New email:
PHP Version: OS:


 [2020-02-26 12:50 UTC] dpiekarski at dompie dot de
Getting the grapheme_strlen() of the string 'नमस्ते' returns 3 instead of expecting 4.

ICU version 	65.1
ICU Data version 	65.1
ICU TZData version 	2019c
ICU Unicode version 	12.1 
iconv library version => 2.27

On another system (Debian 10) with PHP7.3 and
ICU version => 63.1
ICU Data version => 63.1
ICU TZData version => 2018e
ICU Unicode version => 11.0
iconv library version => 2.28

it works as expected.

Test script:

$word = 'नमस्ते';

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2020-02-26 14:53 UTC]
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2020-02-26 14:53 UTC]
This is not related to the PHP version; e.g. PHP 7.3 with ICU
66.0.1 prints int(3) as well, so it's obviously an upstream issue.

 [2020-03-25 22:38 UTC] srl295 at gmail dot com
This is a User Perceived Character, aka Extended Grapheme Cluster, see

`स्ते` is one cluster. If you use a Unicode-aware (especially GUI) text editor with the arrow keys through the string नमस्ते , you will see that the cursor and selection don't break up between the  "m" and the "ste" 

See for example where "षि" is one grapheme cluster.

So I would say this is not a bug.
 [2020-03-26 10:17 UTC] dpiekarski at dompie dot de
So you basically say, that the last both grapheme clusters on this page are basically one?

Especially this image:

I'm still not convinced. On my system (ubuntu18) in every possible editor and PHPStorm I have 4 grapheme clusters as shown in the linked image above.

And you say, this is all wrong? I don't know that language nor the word, but technically it looks to me as it should be a length of 4. The last both letters of the word 'नमस्ते' are already grapheme clusters.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Jul 24 12:01:28 2024 UTC