php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #33793 Mojibake produced when converting utf-8 to sjis with certain characters
Submitted: 2005-07-21 03:19 UTC Modified: 2005-11-13 20:47 UTC
From: lars dot jensen at careercross dot com Assigned: moriyoshi (profile)
Status: Not a bug Package: mbstring related
PHP Version: 5.1.0-dev OS: FreeBSD 5.3
Private report: No CVE-ID: None
 [2005-07-21 03:19 UTC] lars dot jensen at careercross dot com
Description:
------------
Writing a class to handle conversion of UTF-8 input into SJIS using usual 
$body = mb_convert_encoding($body, "SJIS", "UTF-8");
function, usually works, but so far by testing, I identified three kanji's which makes this function fail to convert these correctly and thus causing mojibake

The three UTF-8 characters is identified as follows 
chr(227).chr(136).chr(177)
chr(227).chr(136).chr(178)
chr(227).chr(136).chr(185)

which in SJIS corresponds to
chr(135).chr(138)
chr(135).chr(139)
chr(135).chr(140)

Reproduce code:
---------------
I created a "quick'n'dirty" solution as follows, which surely isnt optimal

$body = str_replace(chr(227).chr(136).chr(177), '#mojihack1#', $body);
$body = str_replace(chr(227).chr(136).chr(178), '#mojihack2#', $body);
$body = str_replace(chr(227).chr(136).chr(185), '#mojihack3#', $body);

$body = mb_convert_encoding($body, "SJIS", "UTF-8");

$body = str_replace('#mojihack1#', chr(135).chr(138), $body);
$body = str_replace('#mojihack2#', chr(135).chr(139), $body);
$body = str_replace('#mojihack3#', chr(135).chr(140), $body);



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-07-21 10:19 UTC] tony2001@php.net
Assigned to the maintainer.
 [2005-07-21 12:47 UTC] sniper@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5-win32-latest.zip


 [2005-07-22 08:04 UTC] lars dot jensen at careercross dot com
Upgraded to PHP Version 5.1.0-dev via the posted link, but experience the exact same result as before.

I dont have access to a Windows server, so I havent tested with the Win32 version.
 [2005-11-13 20:47 UTC] moriyoshi@php.net
This is not a bug.

The character that corresponds to the codepoint U#3231 is not contained by the coded character set called "Shift_JIS".

Instead, another similar coded character set "CP932", which is often mistaken for "Shift_JIS" because of some historical reasons, contains that character.

Probably your problem is solved by simply specifying "CP932" in every place of "SJIS" .



 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 12:01:31 2024 UTC