php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #48210 mb_detect_encoding() fails to detect right charset
Submitted: 2009-05-09 17:53 UTC Modified: 2009-05-09 19:10 UTC
From: nilon at kartio dot org Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 5.2.9 OS: Debian Lenny
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: nilon at kartio dot org
New email:
PHP Version: OS:

 

 [2009-05-09 17:53 UTC] nilon at kartio dot org
Description:
------------
mb_detect_encoding detects latin1 '?' as UTF-8 when it clearly isn't multibyte character.

Reproduce code:
---------------
<?php
var_dump(mb_detect_encoding("\xe4", 'UTF-8, ISO-8859-1'));
var_dump(mb_detect_encoding("\xe4", 'ISO-8859-1, UTF-8'));
var_dump(mb_detect_encoding("\xe4", 'ISO-8859-1'));
var_dump(mb_detect_encoding("\xe4", 'UTF-8'));
?>

Expected result:
----------------
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
bool "false"

Actual result:
--------------
string(5) "UTF-8"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(5) "UTF-8"

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-05-09 17:58 UTC] nilon at kartio dot org
With strict option result is:
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(5) "UTF-8"

Still last one should return false.
 [2009-05-09 19:10 UTC] jani@php.net
Please note that encoding detection is not always perfect.
Especially, when the string is too short, the wrong detection might be
caused.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 15:01:29 2024 UTC