php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #38425 iconv's //TRANSLIT not working properly unde euc-jp
Submitted: 2006-08-11 11:19 UTC Modified: 2006-08-11 13:15 UTC
From: stronk7 at moodle dot org Assigned:
Status: Not a bug Package: ICONV related
PHP Version: 5.1.4 OS: Irrelevant
Private report: No CVE-ID: None
 [2006-08-11 11:19 UTC] stronk7 at moodle dot org
Description:
------------
I'm using iconv (with the //TRANSLIT option enabled in order 
to convert from a lot of different encodings to UTF-8.

Everything seems to be working fine, but I've found one 
situation where, perhaps, //TRANSLIT isn't working properly.

It happens when I'm trying to convert from euc-jp to utf-8 
and the string contains some 0xA0 chars.

All the string is english text, but with some spaces between 
words using the 0xA0 char (instead of the correct 0x20 
char).

I'm not an expert about euc-jp but it seems that the 0xA0 
char hasn't meaning at all in that encoding.

http://lfw.org/text/jp.html#euc

Perhaps it shouldn't break the conversion process, at least 
when running under //TRANSLIT mode? 

The //IGNORE mode seems to work ok, ignoring such character. 
If I read the documentatio well it seems that //TRANSLIT 
should transform such character (hopefully to one space, if 
I'm not wrong).



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-08-11 12:19 UTC] tony2001@php.net
Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.


 [2006-08-11 12:51 UTC] stronk7 at moodle dot org
Here it's an example script:

<?php

    echo phpversion();

    $original = 'Hello' . chr(240) . 'World';
    
    $result = iconv('EUC-JP', 'UTF-8//TRANSLIT', $original);

    echo $result;
?>

IMO, it should return:

"Hello?World" (or, if I'm not wrong, "Hello World")

instead it returns "Hello" and stops, where the TRANSLIST 
mode should make it arrive to the end.
 [2006-08-11 12:58 UTC] tony2001@php.net
Notice: iconv(): Detected an illegal character in input string
Even though the character might be legal, it's up to libiconv to decide if it's legal or not and PHP can't change that.
 [2006-08-11 13:15 UTC] stronk7 at moodle dot org
Thanks, I agree!

But shouldn't the //TRANSLIT mode modify such behaviour and 
allow the conversion to continue? If not, perhaps some minor 
modification to the manual page could help, because it seems 
that both the //TRANSLIT and //IGNORE modes continue with the 
conversion and that's not true for the //TRANSLIT mode.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 23:01:28 2024 UTC