|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #33793 Mojibake produced when converting utf-8 to sjis with certain characters
Submitted: 2005-07-21 03:19 UTC Modified: 2005-11-13 20:47 UTC
From: lars dot jensen at careercross dot com Assigned: moriyoshi (profile)
Status: Not a bug Package: mbstring related
PHP Version: 5.1.0-dev OS: FreeBSD 5.3
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: lars dot jensen at careercross dot com
New email:
PHP Version: OS:


 [2005-07-21 03:19 UTC] lars dot jensen at careercross dot com
Writing a class to handle conversion of UTF-8 input into SJIS using usual 
$body = mb_convert_encoding($body, "SJIS", "UTF-8");
function, usually works, but so far by testing, I identified three kanji's which makes this function fail to convert these correctly and thus causing mojibake

The three UTF-8 characters is identified as follows 

which in SJIS corresponds to

Reproduce code:
I created a "quick'n'dirty" solution as follows, which surely isnt optimal

$body = str_replace(chr(227).chr(136).chr(177), '#mojihack1#', $body);
$body = str_replace(chr(227).chr(136).chr(178), '#mojihack2#', $body);
$body = str_replace(chr(227).chr(136).chr(185), '#mojihack3#', $body);

$body = mb_convert_encoding($body, "SJIS", "UTF-8");

$body = str_replace('#mojihack1#', chr(135).chr(138), $body);
$body = str_replace('#mojihack2#', chr(135).chr(139), $body);
$body = str_replace('#mojihack3#', chr(135).chr(140), $body);


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2005-07-21 10:19 UTC]
Assigned to the maintainer.
 [2005-07-21 12:47 UTC]
Please try using this CVS snapshot:
For Windows:

 [2005-07-22 08:04 UTC] lars dot jensen at careercross dot com
Upgraded to PHP Version 5.1.0-dev via the posted link, but experience the exact same result as before.

I dont have access to a Windows server, so I havent tested with the Win32 version.
 [2005-11-13 20:47 UTC]
This is not a bug.

The character that corresponds to the codepoint U#3231 is not contained by the coded character set called "Shift_JIS".

Instead, another similar coded character set "CP932", which is often mistaken for "Shift_JIS" because of some historical reasons, contains that character.

Probably your problem is solved by simply specifying "CP932" in every place of "SJIS" .

PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Jul 23 04:01:29 2024 UTC