php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #62545 wrong unicode mapping in some charsets
Submitted: 2012-07-12 23:04 UTC Modified: -
From: sageptr at gmail dot com Assigned:
Status: Closed Package: mbstring related
PHP Version: 5.4.4 OS: Any
Private report: No CVE-ID: None
 [2012-07-12 23:04 UTC] sageptr at gmail dot com
Description:
------------
ext/mbstring/libmbfl/filters/unicode_table_cp1251.h:
static const unsigned short cp1251_ucs_table[] = {
 0x0402, 0x0403, 0x201a, 0x0453, 0x201e, 0x2026, 0x2020, 0x2021, 
 0x20ac, 0x2030, 0x0409, 0x2039, 0x040a, 0x040c, 0x040b, 0x040f, 
 0x0452, 0x2018, 0x2019, 0x201c, 0x201d, 0x2022, 0x2013, 0x2014, 
 0x003f, 0x2122, 0x0459, 0x203a, 0x045a, 0x045c, 0x045b, 0x045f, 
...
Character 0x98 is mapped to 0x003f (question mark), but actually it's unmapped 
in cp1251 charset. It should be mapped to 0xfffd (substitution character), not 
to 0x003f.

ext/mbstring/libmbfl/filters/unicode_table_cp1252.h:
static const unsigned short cp1252_ucs_table[] = {
 0x20ac,0xfffe,0x201a,0x0192,0x201e,0x2026,0x2020,0x2021,
 0x02c6,0x2030,0x0160,0x2039,0x0152,0xfffe,0x017d,0xfffe,
 0xfffe,0x2018,0x2019,0x201c,0x201d,0x2022,0x2013,0x2014,
 0x02dc,0x2122,0x0161,0x203a,0x0153,0xfffe,0x017e,0x0178
};
Missing characters are mapped to 0xfffe. But actually it's BOM character, not 
substitution character, as it expected to be.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-03-11 17:21 UTC] cmb@php.net
Automatic comment on behalf of cmbecker69@gmx.de
Revision: http://git.php.net/?p=php-src.git;a=commit;h=01ea314e8cfd6c1c6a2c7db9e13be1f581f5d0a1
Log: Fix #62545: wrong unicode mapping in some charsets
 [2018-03-11 17:21 UTC] cmb@php.net
-Status: Open +Status: Closed
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 11:01:28 2024 UTC