php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #68846 False detection of CJK Unified Ideographs Extension E
Submitted: 2015-01-16 14:26 UTC Modified: 2015-03-09 06:48 UTC
From: masakielastic at gmail dot com Assigned: stas (profile)
Status: Closed Package: mbstring related
PHP Version: 5.6.4 OS: Mac OS X
Private report: No CVE-ID: None
 [2015-01-16 14:26 UTC] masakielastic at gmail dot com
Description:
------------
mb_check_encoding return false if the encoding is GB18030 and the string contains CJK Unified Ideographs Extension E (U+2B820 .. U+2CEA1) which Unicode Version 8.0 include.

http://d.hatena.ne.jp/NAOI/20130828/1377674904
http://blogs.adobe.com/CCJKType/2015/01/cjk-unified-ideographs-unicode-8.html
http://blogs.adobe.com/CCJKType/files/2015/01/ideographs-unicode-9.pdf

Test script:
---------------
// U+2B864
$str = mb_convert_encoding("\xF0\xAB\xA1\xA4", 'GB18030', 'UTF-8');
var_dump(
    mb_check_encoding($str, 'GB18030')
);

Expected result:
----------------
true

Actual result:
--------------
false

Patches

Pull Requests

Pull requests:

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-02-18 15:26 UTC] masakielastic at gmail dot com
I created pull request.

https://github.com/php/php-src/pull/1092
 [2015-03-09 06:48 UTC] stas@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: stas
 [2015-03-09 06:48 UTC] stas@php.net
The fix for this bug has been committed.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 11:01:30 2024 UTC