php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #48210 mb_detect_encoding() fails to detect right charset
Submitted: 2009-05-09 17:53 UTC Modified: 2009-05-09 19:10 UTC
From: nilon at kartio dot org Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 5.2.9 OS: Debian Lenny
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: nilon at kartio dot org
New email:
PHP Version: OS:

 

 [2009-05-09 17:53 UTC] nilon at kartio dot org
Description:
------------
mb_detect_encoding detects latin1 '?' as UTF-8 when it clearly isn't multibyte character.

Reproduce code:
---------------
<?php
var_dump(mb_detect_encoding("\xe4", 'UTF-8, ISO-8859-1'));
var_dump(mb_detect_encoding("\xe4", 'ISO-8859-1, UTF-8'));
var_dump(mb_detect_encoding("\xe4", 'ISO-8859-1'));
var_dump(mb_detect_encoding("\xe4", 'UTF-8'));
?>

Expected result:
----------------
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
bool "false"

Actual result:
--------------
string(5) "UTF-8"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(5) "UTF-8"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-05-09 17:58 UTC] nilon at kartio dot org
With strict option result is:
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(5) "UTF-8"

Still last one should return false.
 [2009-05-09 19:10 UTC] jani@php.net
Please note that encoding detection is not always perfect.
Especially, when the string is too short, the wrong detection might be
caused.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Oct 04 04:01:26 2024 UTC