php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #32880 Unable to properly convert from ISO-8859-1 to UTF-8
Submitted: 2005-04-28 23:46 UTC Modified: 2005-04-29 01:49 UTC
From: Diomedes_01 at yahoo dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: 5.0.4 OS: Solaris 9
Private report: No CVE-ID: None
 [2005-04-28 23:46 UTC] Diomedes_01 at yahoo dot com
Description:
------------
I am unable to properly encode certain strings from ISO-8859-1 to UTF-8. I have tried using utf8_encode, mb_convert_encoding and iconv with no success. The code I am attempting this on is as follows:

Reproduce code:
---------------
<?php
$main_test_string = "r?f?rendum sur la Constitution europ?enne";
$string_test = mb_detect_encoding($main_test_string, 'UTF-8, ISO-8859-1');
echo "Encoding used: $string_test<br>"; // Properly displays ISO-8859-1

// First try converting with iconv
$iconv_test = iconv("ISO-8859-1", "UTF-8", $main_test_string);
echo "Iconv test: $iconv_test<br>"; // Displays nothing. No data whatsoever

// Now try converting with mb_convert_encoding
$mb_test = mb_convert_encoding($main_test_string, "UTF-8", "ISO-8859-1");
$string_test2 = mb_detect_encoding($mb_test, 'UTF-8, ISO-8859-1'); 
echo "Encoding used: $string_test2<br>"; // Indicates string is now UTF-8 encoded (which is wrong)
echo "MB Test convert value: $mb_test<br>"; // Displays: référendum sur la Constitution européenne; doesn't look like UTF-8 to me

// Finally try utf8_encode
$utf8_encode_test = utf8_encode($main_test_string);
$string_test3 = mb_detect_encoding($textfieldabstract, 'UTF-8, ISO-8859-1');
echo "Encoding used: $string_test3<br>"; // Indicates string is now UTF-8 encoded (which is wrong)
echo "Abstract post conversion: $utf8_encode_test<br>"; // Same as before, displays: référendum sur la Constitution européenne 
?>

Expected result:
----------------
I should be seeing UTF-8 (Unicode) translated text of the style:
'&#917;&#955;&#955;&#951;&#957;&#953;'

Note that the above does work for non-latin based character sets like chinese, japanese, russian, greek, etc.

Actual result:
--------------
What I am seeing is the following string:

référendum sur la Constitution européenne

Definately not UTF-8. Could be Klingon. :-)

I will admit I am not a Unicode master but this is certainly quite puzzling. According to the documentation, iconv is supposed to work in this case but it is not displaying any data. I am running PHP 5.0.4 with iconv enabled. (I see it in my phpinfo output)

Please advise.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-04-29 01:49 UTC] sniper@php.net
Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.

What you're getting is really UTF-8 encoded string..

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Apr 24 15:01:30 2024 UTC