php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45017 iconv removes tidles when converting from utf to shift_jis
Submitted: 2008-05-16 02:11 UTC Modified: 2008-05-18 18:33 UTC
From: nospam at nihonbunka dot com Assigned:
Status: Not a bug Package: *Unicode Issues
PHP Version: 5.2.6 OS: BSD
Private report: No CVE-ID: None
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
24 + 34 = ?
Subscribe to this entry?

 
 [2008-05-16 02:11 UTC] nospam at nihonbunka dot com
Description:
------------
inconv does not seem to convert single space tildes from utf8 to shift_jis

Please bear in mind that shift_jis tildes are not where one would expect them to be. 
http://en.wikipedia.org/wiki/Shift-JIS
"The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign at 0x5C and an overline at 0x7E in place of the ASCII character set's backslash and tilde respectively."

If I use //IGNORE then the tildes just disappear. If I use //TRANSLIT the bug is even worse - all of the string after and including the first the ~ disappears.

There was also a double byte tilde problem in the past, but this is different. 



Reproduce code:
---------------
<?PHP 
$conv_str = iconv('utf-8','shift-jis'.'//IGNORE','where are the (~) (~) tildes?'); 
echo ($conv_str);
?>



Expected result:
----------------
where are the (~) (~) tildes?

the above in shift_jist using //IGNORE



Actual result:
--------------
where are the () () tildes?

using //IGNORE and

where are the (

using //TRANSLIT

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-05-16 06:12 UTC] derick@php.net
That's because shift-jis doesn't support the (ASCII) tilde. (Unicode char 0x7D): http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P130-1999&ShowSet&s=ALL#ShowSet
 [2008-05-16 07:16 UTC] nospam at nihonbunka dot com
Hmm tilde is often displayed and used in Japan. How can this be? 
I have web pages such as that below, which I can type into and display on a shift_jis encoded page

http://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php

The contents of this file is
<?PHP 
$string = 'https://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php'; // This is what we start off with
echo ('this is what we start with = '.$string.'<BR />'); //print string at start
$conv_str = iconv('utf-8','shift-jis'.'//TRANSLIT',$string); 
echo ('this is not working = '.$conv_str.'<BR />'); //Just to show that this is not working.

$rstring = preg_replace ('/~/','1bytetilde',$string);   //modify before conversion
echo ('this is modified string here = '.$rstring.'<BR />'); //This is the modified string

$conv_str2 = iconv('utf-8','shift-jis'.'//TRANSLIT',$rstring); //convert
$rereplace=chr(126); //$rereplace is a one byte tilde in shift_jis
$rerstring = preg_replace ('/1bytetilde/',$rereplace,$conv_str2); //rereplace with tildes
echo ('this is the correct result = '.$rerstring.'<BR />'); //the correct result
?>
 [2008-05-16 07:59 UTC] nospam at nihonbunka dot com
Given that there is no single byte tilde in shift_jis then shouldn't "//TRANSLIT" give the ascii code all the same? 

//TRANSLIT seems to break at the tilde and not display either the ascii code or the rest of the string. 

For some unknown reason shift-jis encoded pages seem to display the char(126) 7E character as a tilde and not an overbar, so the fact that the tilde does not exist would not be a problem if it were TRANSLITted.
http://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php
 [2008-05-18 18:33 UTC] nospam at nihonbunka dot com
In the Microsoft version of the Shift-Jis there is a tilde. 

http://www.microsoft.com/globaldev/reference/dbcs/932.mspx
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Nov 09 20:01:29 2024 UTC