php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #33793 Mojibake produced when converting utf-8 to sjis with certain characters
Submitted: 2005-07-21 03:19 UTC Modified: 2005-11-13 20:47 UTC
From: lars dot jensen at careercross dot com Assigned: moriyoshi (profile)
Status: Not a bug Package: mbstring related
PHP Version: 5.1.0-dev OS: FreeBSD 5.3
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: lars dot jensen at careercross dot com
New email:
PHP Version: OS:

 

 [2005-07-21 03:19 UTC] lars dot jensen at careercross dot com
Description:
------------
Writing a class to handle conversion of UTF-8 input into SJIS using usual 
$body = mb_convert_encoding($body, "SJIS", "UTF-8");
function, usually works, but so far by testing, I identified three kanji's which makes this function fail to convert these correctly and thus causing mojibake

The three UTF-8 characters is identified as follows 
chr(227).chr(136).chr(177)
chr(227).chr(136).chr(178)
chr(227).chr(136).chr(185)

which in SJIS corresponds to
chr(135).chr(138)
chr(135).chr(139)
chr(135).chr(140)

Reproduce code:
---------------
I created a "quick'n'dirty" solution as follows, which surely isnt optimal

$body = str_replace(chr(227).chr(136).chr(177), '#mojihack1#', $body);
$body = str_replace(chr(227).chr(136).chr(178), '#mojihack2#', $body);
$body = str_replace(chr(227).chr(136).chr(185), '#mojihack3#', $body);

$body = mb_convert_encoding($body, "SJIS", "UTF-8");

$body = str_replace('#mojihack1#', chr(135).chr(138), $body);
$body = str_replace('#mojihack2#', chr(135).chr(139), $body);
$body = str_replace('#mojihack3#', chr(135).chr(140), $body);



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-07-21 10:19 UTC] tony2001@php.net
Assigned to the maintainer.
 [2005-07-21 12:47 UTC] sniper@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php5-win32-latest.zip


 [2005-07-22 08:04 UTC] lars dot jensen at careercross dot com
Upgraded to PHP Version 5.1.0-dev via the posted link, but experience the exact same result as before.

I dont have access to a Windows server, so I havent tested with the Win32 version.
 [2005-11-13 20:47 UTC] moriyoshi@php.net
This is not a bug.

The character that corresponds to the codepoint U#3231 is not contained by the coded character set called "Shift_JIS".

Instead, another similar coded character set "CP932", which is often mistaken for "Shift_JIS" because of some historical reasons, contains that character.

Probably your problem is solved by simply specifying "CP932" in every place of "SJIS" .



 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Oct 27 16:01:27 2024 UTC