PHP :: Bug #38425 :: iconv's //TRANSLIT not working properly unde euc-jp

Bug #38425	iconv's //TRANSLIT not working properly unde euc-jp
Submitted:	2006-08-11 11:19 UTC	Modified:	2006-08-11 13:15 UTC
From:	stronk7 at moodle dot org	Assigned:
Status:	Not a bug	Package:	ICONV related
PHP Version:	5.1.4	OS:	Irrelevant
Private report:	No	CVE-ID:	None

View Developer Edit

Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.

php.net Username: php.net Password:

Quick Fix:	(description)
	Block user comment
Status:		Assign to:
Package:
Bug Type:
Summary:
From:	stronk7 at moodle dot org
New email:
PHP Version:		OS:

New/Additional Comment:

[2006-08-11 11:19 UTC] stronk7 at moodle dot org

Description:
------------
I'm using iconv (with the //TRANSLIT option enabled in order 
to convert from a lot of different encodings to UTF-8.

Everything seems to be working fine, but I've found one 
situation where, perhaps, //TRANSLIT isn't working properly.

It happens when I'm trying to convert from euc-jp to utf-8 
and the string contains some 0xA0 chars.

All the string is english text, but with some spaces between 
words using the 0xA0 char (instead of the correct 0x20 
char).

I'm not an expert about euc-jp but it seems that the 0xA0 
char hasn't meaning at all in that encoding.

http://lfw.org/text/jp.html#euc

Perhaps it shouldn't break the conversion process, at least 
when running under //TRANSLIT mode? 

The //IGNORE mode seems to work ok, ignoring such character. 
If I read the documentatio well it seems that //TRANSLIT 
should transform such character (hopefully to one space, if 
I'm not wrong).

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2006-08-11 12:19 UTC] tony2001@php.net

Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.

[2006-08-11 12:51 UTC] stronk7 at moodle dot org

Here it's an example script:

<?php

    echo phpversion();

    $original = 'Hello' . chr(240) . 'World';
    
    $result = iconv('EUC-JP', 'UTF-8//TRANSLIT', $original);

    echo $result;
?>

IMO, it should return:

"Hello?World" (or, if I'm not wrong, "Hello World")

instead it returns "Hello" and stops, where the TRANSLIST 
mode should make it arrive to the end.

[2006-08-11 12:58 UTC] tony2001@php.net

Notice: iconv(): Detected an illegal character in input string
Even though the character might be legal, it's up to libiconv to decide if it's legal or not and PHP can't change that.

[2006-08-11 13:15 UTC] stronk7 at moodle dot org

Thanks, I agree!

But shouldn't the //TRANSLIT mode modify such behaviour and 
allow the conversion to continue? If not, perhaps some minor 
modification to the manual page could help, because it seems 
that both the //TRANSLIT and //IGNORE modes continue with the 
conversion and that's not true for the //TRANSLIT mode.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Mon Oct 27 19:00:02 2025 UTC