php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46131 mb_check_encoding returns wrong result when using iso-2022-jp character set
Submitted: 2008-09-19 20:16 UTC Modified: 2008-11-16 01:00 UTC
From: areid at lumerical dot com Assigned: hirokawa (profile)
Status: No Feedback Package: mbstring related
PHP Version: 5.2.6 OS: RHEL5
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2008-09-19 20:16 UTC] areid at lumerical dot com
Description:
------------
The mb_check_encoding function returns false when a particular Japanese character is used with the iso-2022-jp character set. The offending character has hex code 2d6a. This is a special character representing "incorporated". The character itself does not seem to be in the JIS X 0208-1983 character table, but most windows applications seem to recognize it (Outlook, Firefox, Explorer, etc). In this particular case, the original text was composed in Outlook.

Reproduce code:
---------------
//This is valid iso-2022-jp code for
//this single Japanese character representing incorporated
$txt = "\x1b\x24\x42\x2d\x6a";

//The output of the below code will be "bad encoding"
if(mb_check_encoding($txt,'ISO-2022-JP')){
        echo 'good encoding';
}else{
        echo 'bad encoding';


Expected result:
----------------
"good encoding" should be printed

Actual result:
--------------
"bad encoding" is printed

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-10-30 10:42 UTC] felipe@php.net
Assigned to the maintainer.
 [2008-11-08 03:00 UTC] hirokawa@php.net
$txt = "\x1b\x24\x42\x2d\x6a" 
is wrong ISO-2022-JP encoded string.
It should be 
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42".

Please try:
<?php
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42";
if(mb_check_encoding($txt,'ISO-2022-JP')){
        echo 'good encoding';
} else{
        echo 'bad encoding';
}
?>

result: good encoding
 [2008-11-08 03:03 UTC] hirokawa@php.net
ISO-2022-JP doesn't include the vendor specific characters.
Please use ISO-2022-JP-MS instead of ISO-2022-JP.

And,
$txt = "\x1b\x24\x42\x2d\x6a" 
is wrong ISO-2022-JP encoded string.
It should be 
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42".

Please try:
<?php
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42";
if(mb_check_encoding($txt,'ISO-2022-JP-MS')){
        echo 'good encoding';
} else{
        echo 'bad encoding';
}
?>

result: good encoding
 [2008-11-16 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Apr 28 13:01:29 2024 UTC