php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #44014 mb_convert_encoding 'destroys' first character (UTF16->UTF8)
Submitted: 2008-02-01 12:08 UTC Modified: 2008-02-24 01:00 UTC
Votes:2
Avg. Score:4.5 ± 0.5
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: michael202 at gmx dot de Assigned: hirokawa (profile)
Status: No Feedback Package: mbstring related
PHP Version: 5.2.5 OS: Win XP
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: michael202 at gmx dot de
New email:
PHP Version: OS:

 

 [2008-02-01 12:08 UTC] michael202 at gmx dot de
Description:
------------
mb_convert_encoding 'destroys' first character when
converting from UTF16 to UTF8

(iconv works).

Reproduce code:
---------------
$utf16 = chr(0xFF).chr(0xFE).chr(0x4d).chr(0).chr(0x6f).chr(0); //'Mo'

$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16');  

echo($utf8 . "\n");     // -> ?++???o

$utf8 = iconv('UTF-16', 'UTF-8', $utf16);  

echo($utf8 . "\n");     // -> Mo 


Expected result:
----------------
mb:    (BOM8)Mo
iconv: Mo

(BOM8) is a placeholder

Actual result:
--------------
mb:    (BOM8)???o  (copied from cmd shell)
iconv: Mo

(BOM8) is a placeholder



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-02-05 05:10 UTC] jani@php.net
Assigned to the mbstring maintainer.
 [2008-02-16 12:17 UTC] hirokawa@php.net
BOM of Unicode is not supported by encoding conversion function 
in mbstring.

And big endian is default in UTF-16. Please specify 'UTF-16LE'
if you need to specify little endian format.

Try,

<?php
$utf16 = chr(0).chr(0x4d).chr(0).chr(0x6f); //'Mo'
$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16'); 
echo($utf8 . "\n");     // -> Mo
?>

or

<?php
$utf16 = chr(0x4d).chr(0).chr(0x6f).chr(0); //'Mo'
$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16LE'); 
echo($utf8 . "\n");     // -> Mo
?>

 [2008-02-24 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2008-03-28 09:44 UTC] d_kelsey at uk dot ibm dot com
My understanding of UTF-16 is that the BOM is a mandatory. For mbstring I have found that if I input a UTF-16 string for conversion in mb_convert_encoding for example to UTF-8, it treats the BOM as UTF-16 data and converts it.

MBString doesn't generate the BOM when converting to UTF-16, so as I thought the BOM was mandatory, it isn't generating valid UTF-16 bytes.

I see that MBString uses UTF-16BE effectively when you specify UTF-16.

If mbstring doesn't support BOM then UTF-16 cannot be handled properly. Should this at least be documented and recommend considering using UTF-16BE as the encoding so that you are explicit in what is supportable ?
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Wed Jan 15 14:01:30 2025 UTC