php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46131 mb_check_encoding returns wrong result when using iso-2022-jp character set
Submitted: 2008-09-19 20:16 UTC Modified: 2008-11-16 01:00 UTC
From: areid at lumerical dot com Assigned: hirokawa (profile)
Status: No Feedback Package: mbstring related
PHP Version: 5.2.6 OS: RHEL5
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: areid at lumerical dot com
New email:
PHP Version: OS:

 

 [2008-09-19 20:16 UTC] areid at lumerical dot com
Description:
------------
The mb_check_encoding function returns false when a particular Japanese character is used with the iso-2022-jp character set. The offending character has hex code 2d6a. This is a special character representing "incorporated". The character itself does not seem to be in the JIS X 0208-1983 character table, but most windows applications seem to recognize it (Outlook, Firefox, Explorer, etc). In this particular case, the original text was composed in Outlook.

Reproduce code:
---------------
//This is valid iso-2022-jp code for
//this single Japanese character representing incorporated
$txt = "\x1b\x24\x42\x2d\x6a";

//The output of the below code will be "bad encoding"
if(mb_check_encoding($txt,'ISO-2022-JP')){
        echo 'good encoding';
}else{
        echo 'bad encoding';


Expected result:
----------------
"good encoding" should be printed

Actual result:
--------------
"bad encoding" is printed

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-10-30 10:42 UTC] felipe@php.net
Assigned to the maintainer.
 [2008-11-08 03:00 UTC] hirokawa@php.net
$txt = "\x1b\x24\x42\x2d\x6a" 
is wrong ISO-2022-JP encoded string.
It should be 
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42".

Please try:
<?php
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42";
if(mb_check_encoding($txt,'ISO-2022-JP')){
        echo 'good encoding';
} else{
        echo 'bad encoding';
}
?>

result: good encoding
 [2008-11-08 03:03 UTC] hirokawa@php.net
ISO-2022-JP doesn't include the vendor specific characters.
Please use ISO-2022-JP-MS instead of ISO-2022-JP.

And,
$txt = "\x1b\x24\x42\x2d\x6a" 
is wrong ISO-2022-JP encoded string.
It should be 
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42".

Please try:
<?php
$txt = "\x1b\x24\x42\x2d\x6a\x1b\x28\x42";
if(mb_check_encoding($txt,'ISO-2022-JP-MS')){
        echo 'good encoding';
} else{
        echo 'bad encoding';
}
?>

result: good encoding
 [2008-11-16 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Thu Jul 17 14:04:04 2025 UTC