php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79122 mb_convert_encoding doesn't convert properly to UTF-16
Submitted: 2020-01-15 17:28 UTC Modified: 2020-01-15 17:30 UTC
From: pascu dot vlad at gmail dot com Assigned:
Status: Not a bug Package: Unicode Engine related
PHP Version: Irrelevant OS: Irrelevant
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: pascu dot vlad at gmail dot com
New email:
PHP Version: OS:

 

 [2020-01-15 17:28 UTC] pascu dot vlad at gmail dot com
Description:
------------
mb_convert_encoding seems to not produce correct results when dealing with higher Unicode characters. When supplying UTF-8 encoded strings,, the mb_* functions return all valid results for characters count, code point and byte count. However, after converting to UTF-16, the whole string is mangled.

I have attached 4 links with relevant examples.

Test script:
---------------
https://3v4l.org/LQott
https://3v4l.org/DrEcL
https://3v4l.org/acOTR
https://3v4l.org/5VXFC

Expected result:
----------------
Correct result for total length (number of characters) and each character (code point + byte count) in both encodings, as documented here (links in order with code examples):

https://www.iemoji.com/view/emoji/885/smileys-people/grinning-face
https://www.iemoji.com/view/emoji/1659/flags/romania
https://www.iemoji.com/view/emoji/1379/skin-tones/baby-light-skin-tone
https://www.iemoji.com/view/emoji/306/symbols/hash-key

Actual result:
--------------
Correct result only for the supplied UTF-8 string and no valid results for any of the converted strings.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-01-15 17:30 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2020-01-15 17:30 UTC] nikic@php.net
You are not passing 'UTF-16' as the $encoding parameter, so all your strings are interpreted as UTF-8.
 [2020-01-15 17:48 UTC] pascu dot vlad at gmail dot com
Alrighty...

My bad. I thought once converted, a string would be recognized with the correct encoding on subsequent calls.

You can close this and I'll go wash the egg of my face.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Dec 27 09:01:29 2024 UTC