php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70114 mb_convert_encoding UTF-16 and UTF-16LE BOM behaviour
Submitted: 2015-07-22 13:27 UTC Modified: 2021-07-20 11:42 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 1 (0.0%)
From: cph at osde dot uk Assigned: cmb (profile)
Status: Not a bug Package: mbstring related
PHP Version: 5.6.11 OS: ubuntu
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: cph at osde dot uk
New email:
PHP Version: OS:

 

 [2015-07-22 13:27 UTC] cph at osde dot uk
Description:
------------
mb_convert_encoding UTF-16 and UTF-16LE BOM behaviour may be related to bug.php?id=34776

Test script:
---------------
        $utf16 = fgets($fh, 1024);

        echo bin2hex($utf16).PHP_EOL;
        
        $utf8 = iconv('UTF-16LE','UTF-8',mb_substr($utf16,1,null,'UTF-16LE'));
        echo bin2hex($utf8).PHP_EOL;
        var_export($utf8);
        
        $utf8 = mb_convert_encoding($utf16,'UTF-8','UTF-16');
        echo bin2hex($utf8).PHP_EOL;
        var_export($utf8);

        $utf8 = mb_convert_encoding($utf16,'UTF-8','UTF-16LE');
        echo bin2hex($utf8).PHP_EOL;
        var_export($utf8);


Expected result:
----------------
RAW
fffe48006f00740065006c00...

IC
486f74656c49447c22486f74...
'Hot

16
486f74656c49447c22486f74...
'Hot

16LE
486f74656c49447c22486f74...
'Hot

Actual result:
--------------
RAW
fffe48006f00740065006c00...

IC
486f74656c49447c22486f74...
'Hot

16
486f74656c49447c22486f74...
'Hot

16LE
efbbbf486f74656c49447c22...
'Hot

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-07-22 16:52 UTC] cmb@php.net
I do not consider this behavior to be a bug. When a BOM is passed,
conversion from UTF-16 strips the BOM, while conversion from
UTF-16LE doesn't (both make sense). Either way you get a valid
UTF-8 string.
 [2015-07-22 19:04 UTC] cph at osde dot uk
thanks for investigating
 [2021-07-20 11:42 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-07-20 11:42 UTC] cmb@php.net
Well, closing per my previous comment.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Sep 14 07:01:28 2024 UTC