|   | php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login | 
| 
  [2005-12-16 17:18 UTC] matteo at beccati dot com
 Description:
------------
I was evaluating the mbstring extension because of its capabilities to filter and convert input parameter to the correct encoding. During my test I found out that an ISO-8859-1 string which ends with an an accented character is wrongly detected as UTF-8, even if it ends with an incomplete multibyte character (using iconv to convert the string raises such notice).
Also reproduced with PHP 4.3.11 on FreeBSD 4 and 5.0.2 on Win32.
Reproduce code:
---------------
<?php
error_reporting(E_ALL);
mb_detect_order('ASCII,UTF-8,ISO-8859-1');
// \xE0 is ISO-8859-1 small a grave char
test_bug("Test: \xE0");
test_bug("Test: \xE0a");
function test_bug($s) {
    echo "Trying: ";
    var_dump($s);
    iconv('UTF8', 'UCS2', $s);
    echo "Detected encoding: ".mb_detect_encoding($s)."\n";
    echo "Converted string:";
    var_dump(mb_convert_encoding($s, 'UTF-8',
        'ASCII,UTF-8,ISO-8859-1'));
    echo "\n";
}
?>
Expected result:
----------------
Trying: string(7) "Test: ?"
Notice: iconv(): Detected an incomplete multibyte character in input string in test.php on line 13
Detected encoding: ISO-8859-1
Converted string:string(8) "Test: ? "
Trying: string(8) "Test: ?a"
Notice: iconv(): Detected an illegal character in input string in /var/www/mbstring/test.php on line 13
Detected encoding: ISO-8859-1
Converted string:string(9) "Test: ? a"
Actual result:
--------------
Trying: string(7) "Test: ?"
Notice: iconv(): Detected an incomplete multibyte character in input string in test.php on line 13
Detected encoding: UTF-8
Converted string:string(6) "Test: "
Trying: string(8) "Test: ?a"
Notice: iconv(): Detected an illegal character in input string in /var/www/mbstring/test.php on line 13
Detected encoding: ISO-8859-1
Converted string:string(9) "Test: ? a"
PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits             | |||||||||||||||||||||||||||
|  Copyright © 2001-2025 The PHP Group All rights reserved. | Last updated: Wed Oct 22 20:00:01 2025 UTC | 
I've made a patch which seems to fix the issue. It basicly checks filter status during judgement. Status seems to be != 0 only when it is matching a multibyte character. I added anyway a fallback to the old judgement routine, just in case no matching encoding is found. Index: ext/mbstring/libmbfl/mbfl/mbfilter.c =================================================================== RCS file: /repository/php-src/ext/mbstring/libmbfl/mbfl/mbfilter.c,v retrieving revision 1.7.2.1 diff -u -r1.7.2.1 mbfilter.c --- ext/mbstring/libmbfl/mbfl/mbfilter.c 5 Nov 2005 04:49:57 -0000 1.7.2.1 +++ ext/mbstring/libmbfl/mbfl/mbfilter.c 16 Dec 2005 22:46:26 -0000 @@ -575,12 +575,22 @@ for (i = 0; i < num; i++) { filter = &flist[i]; - if (!filter->flag) { + if (!filter->flag && !filter->status) { encoding = filter->encoding; break; } } + if (!encoding) { + for (i = 0; i < num; i++) { + filter = &flist[i]; + if (!filter->flag) { + encoding = filter->encoding; + break; + } + } + } + /* cleanup */ /* dtors should be called in reverse order */ i = num; while (--i >= 0) {