|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #49528 UTF-16 strings prefixed by BOMs wrondly converted
Submitted: 2009-09-11 07:45 UTC Modified: 2009-09-23 14:27 UTC
From: Assigned: moriyoshi
Status: Closed Package: mbstring related
PHP Version: 5.3SVN-2009-09-11 (SVN) OS: *
Private report: No CVE-ID:
 [2009-09-11 07:45 UTC]
The first character of a UTF-16 string prefixed by "\xff\xfe" (LE BOM) gets converted to wrong Unicode codepoint. Moreover, the resulting string contains the BOM itself while it is uncalled for.

Reproduce code:
var_dump(bin2hex(mb_convert_encoding("\xff\xfe\x01\x02\x03\x04", "UCS-2", "UTF-16")));

Expected result:
string(8) "02010403"

Actual result:
string(12) "feffff010403"


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2009-09-11 08:18 UTC]
It can be argued that the BOM character U+FEFF should never be converted, as it is no real character.

I don't think it is the task of mb_convert_encoding to detect the byte order and interpret the BOM.
 [2009-09-11 08:21 UTC]
Moriyoshi propably added this report as reminder for himself.
 [2009-09-11 08:22 UTC]
Automatic comment from SVN on behalf of moriyoshi
Log: - Fix bug #49528 (UTF-16 strings prefixed by BOM wrongly converted).
 [2009-09-11 14:34 UTC]
Fixed -> closed (?) (or did you leave this open just for fun?)
 [2009-09-11 21:07 UTC]
I left this open because patch is not merged to 5.2 yet.

 [2009-09-23 14:27 UTC]
This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
Thank you for the report, and for helping us make PHP better.

PHP Copyright © 2001-2015 The PHP Group
All rights reserved.
Last updated: Wed Dec 02 03:01:32 2015 UTC