php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #47876 Wrong mb_detect_encoding() with string "chr᚝any"
Submitted: 2009-04-02 10:08 UTC Modified: 2009-04-21 01:00 UTC
Votes:5
Avg. Score:3.6 ± 0.8
Reproduced:5 of 5 (100.0%)
Same Version:1 (20.0%)
Same OS:1 (20.0%)
From: FrancS at seznam dot cz Assigned:
Status: No Feedback Package: mbstring related
PHP Version: 5.2.9 OS: Windows XP
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: FrancS at seznam dot cz
New email:
PHP Version: OS:

 

 [2009-04-02 10:08 UTC] FrancS at seznam dot cz
Description:
------------
Hi,

today I discover a problem with mb function mb_detect_encoding().

I have a string "chr᚝any" in czech language. It seems that this function everytime return UTF-8 encoding, even if I load the text from a file with encoding "windows-1250" or "ISO-8859-2".


Reproduce code:
---------------
// test.txt is text file with charset "windows-1250" or "ISO-8859-2"

$string = file_get_contents('test.txt');

var_dump(mb_detect_encoding($string, mb_list_encodings(), true));

Expected result:
----------------
SJIS

Actual result:
--------------
utf-8

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-04-02 10:13 UTC] FrancS at seznam dot cz
I look again on it and problem is with "᚝" in word with no other accent chars in it.
 [2009-04-13 17:58 UTC] jani@php.net
What if you pass the function the possible encodings and never "auto" 
which always has UTF-8 as first. Something like like this:

echo mb_detect_encoding($str, "SJIS, sjis-win");

 [2009-04-14 08:25 UTC] FrancS at seznam dot cz
It is because I want to use it for finding which encoding I have in input string. It is posible that some user send some data in one of these encoding....utf-8, windows-1250 and ISO-8859-2.

It is important to me to find in which encoding it is in.
 [2009-04-21 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 28 22:01:28 2024 UTC