php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79122 mb_convert_encoding doesn't convert properly to UTF-16
Submitted: 2020-01-15 17:28 UTC Modified: 2020-01-15 17:30 UTC
From: pascu dot vlad at gmail dot com Assigned:
Status: Not a bug Package: Unicode Engine related
PHP Version: Irrelevant OS: Irrelevant
Private report: No CVE-ID: None
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
35 - 31 = ?
Subscribe to this entry?

 
 [2020-01-15 17:28 UTC] pascu dot vlad at gmail dot com
Description:
------------
mb_convert_encoding seems to not produce correct results when dealing with higher Unicode characters. When supplying UTF-8 encoded strings,, the mb_* functions return all valid results for characters count, code point and byte count. However, after converting to UTF-16, the whole string is mangled.

I have attached 4 links with relevant examples.

Test script:
---------------
https://3v4l.org/LQott
https://3v4l.org/DrEcL
https://3v4l.org/acOTR
https://3v4l.org/5VXFC

Expected result:
----------------
Correct result for total length (number of characters) and each character (code point + byte count) in both encodings, as documented here (links in order with code examples):

https://www.iemoji.com/view/emoji/885/smileys-people/grinning-face
https://www.iemoji.com/view/emoji/1659/flags/romania
https://www.iemoji.com/view/emoji/1379/skin-tones/baby-light-skin-tone
https://www.iemoji.com/view/emoji/306/symbols/hash-key

Actual result:
--------------
Correct result only for the supplied UTF-8 string and no valid results for any of the converted strings.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-01-15 17:30 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2020-01-15 17:30 UTC] nikic@php.net
You are not passing 'UTF-16' as the $encoding parameter, so all your strings are interpreted as UTF-8.
 [2020-01-15 17:48 UTC] pascu dot vlad at gmail dot com
Alrighty...

My bad. I thought once converted, a string would be recognized with the correct encoding on subsequent calls.

You can close this and I'll go wash the egg of my face.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Sun Oct 26 23:00:02 2025 UTC