php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #71183 mb_detect_encoding wrong with Windows-1252
Submitted: 2015-12-21 18:28 UTC Modified: 2015-12-21 19:14 UTC
From: martin at sutunam dot com Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 5.5.30 OS: Ubuntu
Private report: No CVE-ID: None
 [2015-12-21 18:28 UTC] martin at sutunam dot com
Description:
------------
mb_detect_encoding doesn't detect string with Windows-1252 ( cp1252 ) encoding. 

The function is supposed to return the first compatible encoding within the provided encoding list as second param.

Even with an ASCII string like 'aaa', it refuse the Windows-1252 encoding.

Test script:
---------------
<?php
$enc = 'Windows-1252';
if (!in_array($enc, mb_list_encodings())) {
    die($enc.' not supported');
}

$val = '100 '. 0x80; // € sign in cp1252
echo mb_detect_encoding($val, $enc.',ISO-8859-1', true).PHP_EOL;

$val = 'aaa';
echo mb_detect_encoding($val, $enc.',ISO-8859-1', true).PHP_EOL;

Expected result:
----------------
Windows-1252
Windows-1252

Actual result:
--------------
ISO-8859-1
ISO-8859-1

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-12-21 19:14 UTC] ab@php.net
-Status: Open +Status: Not a bug
 [2015-12-21 19:14 UTC] ab@php.net
Thanks for the report. 

Firstly, with 0x80 you append a number, not a char. Use chr(0x80).

Secondly, a longer string could have more success. It is in general hard to determine some single byte encoding, and it's almost impossible with passing just 5 byte which all are numbers and white space. It can even fail sometime for multibyte encoding. It is unlikely related to a buggy behavior in any way.

Thanks.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 11 01:01:27 2024 UTC