php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #70114 mb_convert_encoding UTF-16 and UTF-16LE BOM behaviour
Submitted: 2015-07-22 13:27 UTC Modified: 2021-07-20 11:42 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 1 (0.0%)
From: cph at osde dot uk Assigned: cmb (profile)
Status: Not a bug Package: mbstring related
PHP Version: 5.6.11 OS: ubuntu
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: cph at osde dot uk
New email:
PHP Version: OS:

 

 [2015-07-22 13:27 UTC] cph at osde dot uk
Description:
------------
mb_convert_encoding UTF-16 and UTF-16LE BOM behaviour may be related to bug.php?id=34776

Test script:
---------------
        $utf16 = fgets($fh, 1024);

        echo bin2hex($utf16).PHP_EOL;
        
        $utf8 = iconv('UTF-16LE','UTF-8',mb_substr($utf16,1,null,'UTF-16LE'));
        echo bin2hex($utf8).PHP_EOL;
        var_export($utf8);
        
        $utf8 = mb_convert_encoding($utf16,'UTF-8','UTF-16');
        echo bin2hex($utf8).PHP_EOL;
        var_export($utf8);

        $utf8 = mb_convert_encoding($utf16,'UTF-8','UTF-16LE');
        echo bin2hex($utf8).PHP_EOL;
        var_export($utf8);


Expected result:
----------------
RAW
fffe48006f00740065006c00...

IC
486f74656c49447c22486f74...
'Hot

16
486f74656c49447c22486f74...
'Hot

16LE
486f74656c49447c22486f74...
'Hot

Actual result:
--------------
RAW
fffe48006f00740065006c00...

IC
486f74656c49447c22486f74...
'Hot

16
486f74656c49447c22486f74...
'Hot

16LE
efbbbf486f74656c49447c22...
'Hot

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-07-22 16:52 UTC] cmb@php.net
I do not consider this behavior to be a bug. When a BOM is passed,
conversion from UTF-16 strips the BOM, while conversion from
UTF-16LE doesn't (both make sense). Either way you get a valid
UTF-8 string.
 [2015-07-22 19:04 UTC] cph at osde dot uk
thanks for investigating
 [2021-07-20 11:42 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-07-20 11:42 UTC] cmb@php.net
Well, closing per my previous comment.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 30 17:01:29 2024 UTC