|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2006-04-06 11:49 UTC] ynynmzvqofeaz at mailinator dot com
Description: ------------ mb_detect_encoding returns wrong result when text contains a trailing accent. See http://www.php.net/manual/en/function.mb-detect-encoding.php#55228 PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Wed Oct 29 07:00:01 2025 UTC |
*sigh* <?php echo mb_detect_encoding("test?"); ?> use iconv -f utf-8 -t iso-8859-1 INFILE > OUTFILE if "file OUTFILE" says utf-8.Ignore the last comment. Do this: Create two files with the following content, and name them test_iso1 and test_utf8: <?php echo mb_detect_encoding("test?"); ?> Make sure the encoding is correct: $ file test_iso1 should return iso-8859-1 $ file test_utf8 should return utf-8 If they do not return the correct encoding, use iconv to convert them, e.g. $ iconv -f utf-8 -t iso-8859-1 test_iso1 >test_iso1.fixed or $ iconv -f iso-8859-1 -t utf-8 test_utf8 >test_utf8.fixed Now run each script. The test_iso1 script should return a type of iso1, the test_utf8 script should return a type of utf8. Workaround: append an extra character to the end of the string, and then remove it(!)It is not a bug, it is a specification. You should use 'strict' mode in mb_detect_encoding() if you need to return correct result. mb_detect_encoding() treat the string as byte-stream. {0x61,0x63,0x63,0x65,0x6e,0x74,0x75,0xe9} is a correct UTF-8 byte stream. In this case, 0xe9 is treat as the first byte of multibyte character. {0x61,0x63,0x63,0x65,0x6e,0x74,0x75,0xe9,0x65} is wrong UTF-8 byte stream because 0xe965 is invalid byte sequence in UTF-8. If you need to remove the incomplete multibyte character from detection, please try to use 'strict' option like, echo mb_detect_encoding($s1 , 'UTF-8, ISO-8859-1',true);