|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45017 iconv removes tidles when converting from utf to shift_jis
Submitted: 2008-05-16 02:11 UTC Modified: 2008-05-18 18:33 UTC
From: nospam at nihonbunka dot com Assigned:
Status: Not a bug Package: *Unicode Issues
PHP Version: 5.2.6 OS: BSD
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
From: nospam at nihonbunka dot com
New email:
PHP Version: OS:


 [2008-05-16 02:11 UTC] nospam at nihonbunka dot com
inconv does not seem to convert single space tildes from utf8 to shift_jis

Please bear in mind that shift_jis tildes are not where one would expect them to be.
"The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign at 0x5C and an overline at 0x7E in place of the ASCII character set's backslash and tilde respectively."

If I use //IGNORE then the tildes just disappear. If I use //TRANSLIT the bug is even worse - all of the string after and including the first the ~ disappears.

There was also a double byte tilde problem in the past, but this is different. 

Reproduce code:
$conv_str = iconv('utf-8','shift-jis'.'//IGNORE','where are the (~) (~) tildes?'); 
echo ($conv_str);

Expected result:
where are the (~) (~) tildes?

the above in shift_jist using //IGNORE

Actual result:
where are the () () tildes?

using //IGNORE and

where are the (

using //TRANSLIT


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2008-05-16 06:12 UTC]
That's because shift-jis doesn't support the (ASCII) tilde. (Unicode char 0x7D):
 [2008-05-16 07:16 UTC] nospam at nihonbunka dot com
Hmm tilde is often displayed and used in Japan. How can this be? 
I have web pages such as that below, which I can type into and display on a shift_jis encoded page

The contents of this file is
$string = ''; // This is what we start off with
echo ('this is what we start with = '.$string.'<BR />'); //print string at start
$conv_str = iconv('utf-8','shift-jis'.'//TRANSLIT',$string); 
echo ('this is not working = '.$conv_str.'<BR />'); //Just to show that this is not working.

$rstring = preg_replace ('/~/','1bytetilde',$string);   //modify before conversion
echo ('this is modified string here = '.$rstring.'<BR />'); //This is the modified string

$conv_str2 = iconv('utf-8','shift-jis'.'//TRANSLIT',$rstring); //convert
$rereplace=chr(126); //$rereplace is a one byte tilde in shift_jis
$rerstring = preg_replace ('/1bytetilde/',$rereplace,$conv_str2); //rereplace with tildes
echo ('this is the correct result = '.$rerstring.'<BR />'); //the correct result
 [2008-05-16 07:59 UTC] nospam at nihonbunka dot com
Given that there is no single byte tilde in shift_jis then shouldn't "//TRANSLIT" give the ascii code all the same? 

//TRANSLIT seems to break at the tilde and not display either the ascii code or the rest of the string. 

For some unknown reason shift-jis encoded pages seem to display the char(126) 7E character as a tilde and not an overbar, so the fact that the tilde does not exist would not be a problem if it were TRANSLITted.
 [2008-05-18 18:33 UTC] nospam at nihonbunka dot com
In the Microsoft version of the Shift-Jis there is a tilde.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sun Jul 14 16:01:28 2024 UTC