php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #38138 russian encoding detection support
Submitted: 2006-07-19 09:41 UTC Modified: 2011-04-08 18:09 UTC
Votes:59
Avg. Score:4.4 ± 1.0
Reproduced:39 of 42 (92.9%)
Same Version:14 (35.9%)
Same OS:17 (43.6%)
From: techtonik@php.net Assigned:
Status: Open Package: mbstring related
PHP Version: 5.* OS: *
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: techtonik@php.net
New email:
PHP Version: OS:

 

 [2006-07-19 09:41 UTC] techtonik@php.net
Description:
------------
Detection of russian encoding in mb_detect_encoding is disabled although it present among the list of supported encodings. It just three rather simple encodings - windows-1251, cp866 and koi8-r that spoil everyday life routines of russian programmer and make PHP less attractive for millions of potential PHP developers. I'll be grateful if somebody will care about them by providing default option for hosting providers, who are not too enthusiastic to experiment with server-wide configuration.


Reproduce code:
---------------
<?php

$str = "?????? ?????? ??? ??????????? ????????? ???????. ??? ????? ?????????? ? ?????? farplugins ?? CVS. ???????? ?? ??????? ? ??????????? ????? ????????? ? ???????? ? ????????? project website ??? ? ?????? ???????? farplugins-devel.";
// $encoding = mb_detect_encoding($str, "UTF-8, Windows-1251, CP866, KOI8-R");
$encoding = mb_detect_encoding($str, array("UTF-8", "Windows-1251", "CP866", "KOI8-R"));

var_dump($encoding);


Expected result:
----------------
string(12) "Windows-1251"

Actual result:
--------------
bool(false)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2006-07-19 09:50 UTC] tony2001@php.net
Reclassified as feature request, where it belongs.

techtonik, I'm sure you know email addresses of ext/mbstring maintainers and can contact them about it.
Although, I don't think this will ever appear in PHP6 (because mbstring itself doesn't make much sense there) and it definitely won't appear in PHP4 (it's time to upgrade, eh?).
 [2006-07-19 17:34 UTC] techtonik@php.net
Well, i can't say this is ok for me. At first I thought that simple configure with --enable-mbstring=all should solve the problem, but it appeared that my host of dream already has this option turned on. So autodetection of russian language is just not enabled on code level, i.e. i18n support via mbstring is somehow crippled. I evaluated PHP6 for a few days, but it was very far from being complete, unfortunately.
 [2006-07-20 06:27 UTC] tony2001@php.net
>I evaluated PHP6 for a few days, but it was very far from
>being complete, unfortunately.
I wonder why.. probably because it's still 12+ months before the release? =) 
Feel free to help us, though. The documentation is not the only area that needs some help =)
 [2006-07-20 07:04 UTC] techtonik@php.net
I would like to if anybody will explain "how to" port PHP functions into Unicode "for dummies". It will also be nice to see an environment to monitor the changes (?trac) and control requirements. The last one is to help analyze deprecated, inconvenient and obscure API - logical bugs - to provide means to increase usability. Like unify inlcude_path delimiters on all platforms etc. It is just to save some time and make occasional development (which I am pretty restrained to) effective.
 [2009-01-21 08:46 UTC] Roman dot Kyrylych at gmail dot com
here's a russian encoding autodetector that can be used after 
mb_detect_encoding returned false:
http://www.opennet.ru/base/dev/charset_autodetect.txt.html
 [2009-03-20 11:14 UTC] wips at mail dot ru
Another version of encoding detector http://popoff.donetsk.ua/file/text/libs/a.charset.php which works with utf8 too.
 [2010-12-31 12:43 UTC] rustamabd at gmail dot com
Windows-1251, koi8-r, cp866 are all single-byte CHARSETs, not ENCODINGs.

mb_detect_encoding() is not intended to distinguish between charsets, especially 
single-byte charsets. Its primary purpose is to detect which multibyte encoding is 
in use, i.e. UTF-8, UTF-16, shift-JIS, etc.
 [2011-04-08 18:08 UTC] jani@php.net
-Package: Feature/Change Request +Package: *General Issues
 [2011-04-08 18:09 UTC] jani@php.net
-Package: *General Issues +Package: mbstring related -Operating System: +Operating System: * -PHP Version: 4.4.2 +PHP Version: 5.*
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Nov 22 04:01:28 2024 UTC