|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
[2005-04-29 01:49 UTC] sniper@php.net
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Thu Oct 30 18:00:02 2025 UTC |
Description: ------------ I am unable to properly encode certain strings from ISO-8859-1 to UTF-8. I have tried using utf8_encode, mb_convert_encoding and iconv with no success. The code I am attempting this on is as follows: Reproduce code: --------------- <?php $main_test_string = "r?f?rendum sur la Constitution europ?enne"; $string_test = mb_detect_encoding($main_test_string, 'UTF-8, ISO-8859-1'); echo "Encoding used: $string_test<br>"; // Properly displays ISO-8859-1 // First try converting with iconv $iconv_test = iconv("ISO-8859-1", "UTF-8", $main_test_string); echo "Iconv test: $iconv_test<br>"; // Displays nothing. No data whatsoever // Now try converting with mb_convert_encoding $mb_test = mb_convert_encoding($main_test_string, "UTF-8", "ISO-8859-1"); $string_test2 = mb_detect_encoding($mb_test, 'UTF-8, ISO-8859-1'); echo "Encoding used: $string_test2<br>"; // Indicates string is now UTF-8 encoded (which is wrong) echo "MB Test convert value: $mb_test<br>"; // Displays: référendum sur la Constitution européenne; doesn't look like UTF-8 to me // Finally try utf8_encode $utf8_encode_test = utf8_encode($main_test_string); $string_test3 = mb_detect_encoding($textfieldabstract, 'UTF-8, ISO-8859-1'); echo "Encoding used: $string_test3<br>"; // Indicates string is now UTF-8 encoded (which is wrong) echo "Abstract post conversion: $utf8_encode_test<br>"; // Same as before, displays: référendum sur la Constitution européenne ?> Expected result: ---------------- I should be seeing UTF-8 (Unicode) translated text of the style: 'Ελληνι' Note that the above does work for non-latin based character sets like chinese, japanese, russian, greek, etc. Actual result: -------------- What I am seeing is the following string: référendum sur la Constitution européenne Definately not UTF-8. Could be Klingon. :-) I will admit I am not a Unicode master but this is certainly quite puzzling. According to the documentation, iconv is supposed to work in this case but it is not displaying any data. I am running PHP 5.0.4 with iconv enabled. (I see it in my phpinfo output) Please advise.