php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81584 mb_strwidth() returns incorrect value
Submitted: 2021-11-01 17:17 UTC Modified: 2021-11-01 19:53 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: azjezz at protonmail dot com Assigned: alexdowad (profile)
Status: Closed Package: mbstring related
PHP Version: 8.1.0RC5 OS: N/A
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: azjezz at protonmail dot com
New email:
PHP Version: OS:

 

 [2021-11-01 17:17 UTC] azjezz at protonmail dot com
Description:
------------
The "width" or a string containing emojis has changed between PHP <= 8.0 and 8.1.


Test script:
---------------
var_dump(
  mb_strwidth('☕ ☕ ☕'),
  mb_strwidth('????????????'),
  mb_strwidth('♈♉♊♋♌♍♎♏♐♑♒♓'),
);

Expected result:
----------------
int(5)
int(3)
int(12)

Actual result:
--------------
int(8)
int(6)
int(24)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-11-01 17:19 UTC] azjezz at protonmail dot com
test script: https://3v4l.org/X6DWb
 [2021-11-01 17:44 UTC] nikic@php.net
-Assigned To: +Assigned To: alexdowad
 [2021-11-01 19:32 UTC] alexdowad@php.net
-Status: Assigned +Status: Analyzed
 [2021-11-01 19:32 UTC] alexdowad@php.net
Azjezz, thanks for the report!

First of all, the Unicode Consortium maintains the standard on which Unicode characters are considered 'halfwidth' and which are 'fullwidth'. See: http://www.unicode.org/reports/tr11/

If you download all the data tables for the Unicode standard from unicode.org, you can see the raw data on character width in the file EastAsianWidth.txt.

In the first example you kindly provided, the coffee cup emoji are U+2615. The latest version of the Unicode standard indicates these are 'fullwidth' characters. Plus 2 for the spaces, the output of 8 is therefore correct.

Likewise, the gold, silver, and bronze medal emoji are all fullwidth. So 6 is the correct output.

And as for the horoscope symbols... yep, they are also fullwidth.

In summary, the latest version of PHP complies with the latest Unicode standard, and the previous version of PHP did not. Now, I would like to find out... is this causing a problem for you? Is there some reason why you preferred the old Unicode standard to the new one? Looking forward to your feedback.
 [2021-11-01 19:36 UTC] azjezz at protonmail dot com
Hey Alex, Thank you for taking a look at this.

personally i don't have a preference, i encountered this while upgrading one of my packages to PHP 8.1 ( ref: https://github.com/azjezz/psl/pull/246 ), and was advised to report it here on R13.

If you think this is not a bug, you can mark it as resolved :)

Thanks again!
 [2021-11-01 19:53 UTC] alexdowad@php.net
-Status: Analyzed +Status: Closed
 [2021-11-01 19:53 UTC] alexdowad@php.net
Thanks very much for explaining.

Yeah, I would say that if anything, the new behavior is more correct than the old one.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 03 07:01:33 2024 UTC