php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45086 Incorrect default names for chinese charsets
Submitted: 2008-05-24 05:38 UTC Modified: 2008-08-04 05:30 UTC
From: petr dot hroudny at gmail dot com Assigned:
Status: Not a bug Package: mbstring related
PHP Version: 5.2.6 OS: *
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: petr dot hroudny at gmail dot com
New email:
PHP Version: OS:

 

 [2008-05-24 05:38 UTC] petr dot hroudny at gmail dot com
Description:
------------
MBstring uses incorrect default names for two chinese charsets:

1) Official IANA name is GB2312, but MBstring uses EUC-CN
2) Official IANA name is GBK, but MBstrings uses CP936

Reproduce code:
---------------
mb_list_encodings()

Expected result:
----------------
mb_list_encodings()  should return official IANA names (GB2312, GBK)

Actual result:
--------------
mb_list_encodings()  returns EUC-CN and CP936

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-08-01 23:06 UTC] moriyoshi@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

mb_list_encoding() returns the list of the encoding names that are local to the extension and that are not necessarily the same as the IANA's. For another example, it returns "SJIS-win" for the encoding that is referred to by "Windows-31J" as the preferred MIME name in the IANA's registry.


 [2008-08-01 23:08 UTC] moriyoshi@php.net
In addition, GB2312 is not the same as CP936, while these two character sets share most of the characters. (CP936 has larger code space than GB2312 according to the spec.)


 [2008-08-04 05:30 UTC] petr dot hroudny at gmail dot com
I never said that GB2312 is the same as CP936.

There are two separate entries already: EUC-CN (GB2312) and CP936 (GBK).

I'm only saying, that mb_list_encodings() should return official charset names and not "names local to the extension".

Not complying to standards is always bad thing. Imagine that every PHP extension would use different name for the same charset - how would you write a program which should be compatible with more than one extension?

If, for backward compatibility reasons, you still need the non-standard names, please make sure, that mb_list_encodings() returns both non-standard as well as the standard names.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 16:01:28 2024 UTC