|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38425 iconv's //TRANSLIT not working properly unde euc-jp
Submitted: 2006-08-11 11:19 UTC Modified: 2006-08-11 13:15 UTC
From: stronk7 at moodle dot org Assigned:
Status: Not a bug Package: ICONV related
PHP Version: 5.1.4 OS: Irrelevant
Private report: No CVE-ID: None
 [2006-08-11 11:19 UTC] stronk7 at moodle dot org
I'm using iconv (with the //TRANSLIT option enabled in order 
to convert from a lot of different encodings to UTF-8.

Everything seems to be working fine, but I've found one 
situation where, perhaps, //TRANSLIT isn't working properly.

It happens when I'm trying to convert from euc-jp to utf-8 
and the string contains some 0xA0 chars.

All the string is english text, but with some spaces between 
words using the 0xA0 char (instead of the correct 0x20 

I'm not an expert about euc-jp but it seems that the 0xA0 
char hasn't meaning at all in that encoding.

Perhaps it shouldn't break the conversion process, at least 
when running under //TRANSLIT mode? 

The //IGNORE mode seems to work ok, ignoring such character. 
If I read the documentatio well it seems that //TRANSLIT 
should transform such character (hopefully to one space, if 
I'm not wrong).


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2006-08-11 12:19 UTC]
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.

 [2006-08-11 12:51 UTC] stronk7 at moodle dot org
Here it's an example script:


    echo phpversion();

    $original = 'Hello' . chr(240) . 'World';
    $result = iconv('EUC-JP', 'UTF-8//TRANSLIT', $original);

    echo $result;

IMO, it should return:

"Hello?World" (or, if I'm not wrong, "Hello World")

instead it returns "Hello" and stops, where the TRANSLIST 
mode should make it arrive to the end.
 [2006-08-11 12:58 UTC]
Notice: iconv(): Detected an illegal character in input string
Even though the character might be legal, it's up to libiconv to decide if it's legal or not and PHP can't change that.
 [2006-08-11 13:15 UTC] stronk7 at moodle dot org
Thanks, I agree!

But shouldn't the //TRANSLIT mode modify such behaviour and 
allow the conversion to continue? If not, perhaps some minor 
modification to the manual page could help, because it seems 
that both the //TRANSLIT and //IGNORE modes continue with the 
conversion and that's not true for the //TRANSLIT mode.
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Sat Sep 25 00:03:42 2021 UTC