php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45017 iconv removes tidles when converting from utf to shift_jis
Submitted: 2008-05-16 02:11 UTC Modified: 2008-05-18 18:33 UTC
From: nospam at nihonbunka dot com Assigned:
Status: Not a bug Package: *Unicode Issues
PHP Version: 5.2.6 OS: BSD
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: nospam at nihonbunka dot com
New email:
PHP Version: OS:

 

 [2008-05-16 02:11 UTC] nospam at nihonbunka dot com
Description:
------------
inconv does not seem to convert single space tildes from utf8 to shift_jis

Please bear in mind that shift_jis tildes are not where one would expect them to be. 
http://en.wikipedia.org/wiki/Shift-JIS
"The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign at 0x5C and an overline at 0x7E in place of the ASCII character set's backslash and tilde respectively."

If I use //IGNORE then the tildes just disappear. If I use //TRANSLIT the bug is even worse - all of the string after and including the first the ~ disappears.

There was also a double byte tilde problem in the past, but this is different. 



Reproduce code:
---------------
<?PHP 
$conv_str = iconv('utf-8','shift-jis'.'//IGNORE','where are the (~) (~) tildes?'); 
echo ($conv_str);
?>



Expected result:
----------------
where are the (~) (~) tildes?

the above in shift_jist using //IGNORE



Actual result:
--------------
where are the () () tildes?

using //IGNORE and

where are the (

using //TRANSLIT

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-05-16 06:12 UTC] derick@php.net
That's because shift-jis doesn't support the (ASCII) tilde. (Unicode char 0x7D): http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P130-1999&ShowSet&s=ALL#ShowSet
 [2008-05-16 07:16 UTC] nospam at nihonbunka dot com
Hmm tilde is often displayed and used in Japan. How can this be? 
I have web pages such as that below, which I can type into and display on a shift_jis encoded page

http://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php

The contents of this file is
<?PHP 
$string = 'https://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php'; // This is what we start off with
echo ('this is what we start with = '.$string.'<BR />'); //print string at start
$conv_str = iconv('utf-8','shift-jis'.'//TRANSLIT',$string); 
echo ('this is not working = '.$conv_str.'<BR />'); //Just to show that this is not working.

$rstring = preg_replace ('/~/','1bytetilde',$string);   //modify before conversion
echo ('this is modified string here = '.$rstring.'<BR />'); //This is the modified string

$conv_str2 = iconv('utf-8','shift-jis'.'//TRANSLIT',$rstring); //convert
$rereplace=chr(126); //$rereplace is a one byte tilde in shift_jis
$rerstring = preg_replace ('/1bytetilde/',$rereplace,$conv_str2); //rereplace with tildes
echo ('this is the correct result = '.$rerstring.'<BR />'); //the correct result
?>
 [2008-05-16 07:59 UTC] nospam at nihonbunka dot com
Given that there is no single byte tilde in shift_jis then shouldn't "//TRANSLIT" give the ascii code all the same? 

//TRANSLIT seems to break at the tilde and not display either the ascii code or the rest of the string. 

For some unknown reason shift-jis encoded pages seem to display the char(126) 7E character as a tilde and not an overbar, so the fact that the tilde does not exist would not be a problem if it were TRANSLITted.
http://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php
 [2008-05-18 18:33 UTC] nospam at nihonbunka dot com
In the Microsoft version of the Shift-Jis there is a tilde. 

http://www.microsoft.com/globaldev/reference/dbcs/932.mspx
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Tue Feb 18 20:01:29 2025 UTC