|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #71183 mb_detect_encoding wrong with Windows-1252
Submitted: 2015-12-21 18:28 UTC Modified: 2015-12-21 19:14 UTC
From: martin at sutunam dot com Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 5.5.30 OS: Ubuntu
Private report: No CVE-ID: None
 [2015-12-21 18:28 UTC] martin at sutunam dot com
mb_detect_encoding doesn't detect string with Windows-1252 ( cp1252 ) encoding. 

The function is supposed to return the first compatible encoding within the provided encoding list as second param.

Even with an ASCII string like 'aaa', it refuse the Windows-1252 encoding.

Test script:
$enc = 'Windows-1252';
if (!in_array($enc, mb_list_encodings())) {
    die($enc.' not supported');

$val = '100 '. 0x80; // € sign in cp1252
echo mb_detect_encoding($val, $enc.',ISO-8859-1', true).PHP_EOL;

$val = 'aaa';
echo mb_detect_encoding($val, $enc.',ISO-8859-1', true).PHP_EOL;

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2015-12-21 19:14 UTC]
-Status: Open +Status: Not a bug
 [2015-12-21 19:14 UTC]
Thanks for the report. 

Firstly, with 0x80 you append a number, not a char. Use chr(0x80).

Secondly, a longer string could have more success. It is in general hard to determine some single byte encoding, and it's almost impossible with passing just 5 byte which all are numbers and white space. It can even fail sometime for multibyte encoding. It is unlikely related to a buggy behavior in any way.

PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue May 21 15:01:34 2024 UTC