php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79122 mb_convert_encoding doesn't convert properly to UTF-16
Submitted: 2020-01-15 17:28 UTC Modified: 2020-01-15 17:30 UTC
From: pascu dot vlad at gmail dot com Assigned:
Status: Not a bug Package: Unicode Engine related
PHP Version: Irrelevant OS: Irrelevant
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: pascu dot vlad at gmail dot com
New email:
PHP Version: OS:

 

 [2020-01-15 17:28 UTC] pascu dot vlad at gmail dot com
Description:
------------
mb_convert_encoding seems to not produce correct results when dealing with higher Unicode characters. When supplying UTF-8 encoded strings,, the mb_* functions return all valid results for characters count, code point and byte count. However, after converting to UTF-16, the whole string is mangled.

I have attached 4 links with relevant examples.

Test script:
---------------
https://3v4l.org/LQott
https://3v4l.org/DrEcL
https://3v4l.org/acOTR
https://3v4l.org/5VXFC

Expected result:
----------------
Correct result for total length (number of characters) and each character (code point + byte count) in both encodings, as documented here (links in order with code examples):

https://www.iemoji.com/view/emoji/885/smileys-people/grinning-face
https://www.iemoji.com/view/emoji/1659/flags/romania
https://www.iemoji.com/view/emoji/1379/skin-tones/baby-light-skin-tone
https://www.iemoji.com/view/emoji/306/symbols/hash-key

Actual result:
--------------
Correct result only for the supplied UTF-8 string and no valid results for any of the converted strings.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-01-15 17:30 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2020-01-15 17:30 UTC] nikic@php.net
You are not passing 'UTF-16' as the $encoding parameter, so all your strings are interpreted as UTF-8.
 [2020-01-15 17:48 UTC] pascu dot vlad at gmail dot com
Alrighty...

My bad. I thought once converted, a string would be recognized with the correct encoding on subsequent calls.

You can close this and I'll go wash the egg of my face.
 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Wed Oct 27 20:03:34 2021 UTC