|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #79122 mb_convert_encoding doesn't convert properly to UTF-16
Submitted: 2020-01-15 17:28 UTC Modified: 2020-01-15 17:30 UTC
From: pascu dot vlad at gmail dot com Assigned:
Status: Not a bug Package: Unicode Engine related
PHP Version: Irrelevant OS: Irrelevant
Private report: No CVE-ID: None
 [2020-01-15 17:28 UTC] pascu dot vlad at gmail dot com
mb_convert_encoding seems to not produce correct results when dealing with higher Unicode characters. When supplying UTF-8 encoded strings,, the mb_* functions return all valid results for characters count, code point and byte count. However, after converting to UTF-16, the whole string is mangled.

I have attached 4 links with relevant examples.

Test script:

Expected result:
Correct result for total length (number of characters) and each character (code point + byte count) in both encodings, as documented here (links in order with code examples):

Actual result:
Correct result only for the supplied UTF-8 string and no valid results for any of the converted strings.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2020-01-15 17:30 UTC]
-Status: Open +Status: Not a bug
 [2020-01-15 17:30 UTC]
You are not passing 'UTF-16' as the $encoding parameter, so all your strings are interpreted as UTF-8.
 [2020-01-15 17:48 UTC] pascu dot vlad at gmail dot com

My bad. I thought once converted, a string would be recognized with the correct encoding on subsequent calls.

You can close this and I'll go wash the egg of my face.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Jun 17 19:01:30 2024 UTC