php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47366 mb_convert_encoding() converts some symbols incorrectly from EUC-JP to UTF-8
Submitted: 2009-02-12 10:04 UTC Modified: 2009-02-16 15:13 UTC
From: max at injapan dot ru Assigned:
Status: Closed Package: mbstring related
PHP Version: 5.3CVS-2009-02-12 (snap) OS: CentOS 5.2
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: max at injapan dot ru
New email:
PHP Version: OS:

 

 [2009-02-12 10:04 UTC] max at injapan dot ru
Description:
------------
mb_convert_encoding converts symbols \xAD\xB5-\xAD\xBF  incorrectly 
from EUC-JP to UTF-8. It's possible that some other symbols converted 
incorrectly too, but I have no possibility to check it to full 
extent.

Unicode has corresponding codepoints, i.e. U+2161 for Ⅱ.

Majority of EUC-JP texts is converted mormally.

Reproduce code:
---------------
echo mb_convert_encoding("\xAD\xB6", "UTF-8", "EUC-JP");

Expected result:
----------------
string ?Ⅱ? (U+2161)
printed to STDOUT

Actual result:
--------------
string ???
printed to STDOUT

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-02-12 10:06 UTC] max at injapan dot ru
Text in "Expected result" field is messed a little: of course, 
expected output is just one character U+2161.
 [2009-02-16 15:13 UTC] max at injapan dot ru
Problem solved with encoding EUCJP-WIN instead of EUC-JP.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 30 19:01:31 2024 UTC