|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2004-10-25 09:53 UTC] david at davidheath dot org
Description: ------------ MBstring appears to incorrectly map some characters for the following ISO-8859 charsets, as follows: Encoding: ISO-8859-7 incorrect mapping of char 0xa4: got 0x3f, expected 0x20ac incorrect mapping of char 0xa5: got 0x3f, expected 0x20af incorrect mapping of char 0xaa: got 0x3f, expected 0x37a Encoding: ISO-8859-8 incorrect mapping of char 0xaf: got 0x203e, expected 0xaf incorrect mapping of char 0xfd: got 0x3f, expected 0x200e incorrect mapping of char 0xfe: got 0x3f, expected 0x200f Encoding: ISO-8859-10 incorrect mapping of char 0xa4: got 0x124, expected 0x12a This is based on the mappings provided at ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/ on 25th Oct 2004. Note, there are undated comments in the "Version history" for the above files, as follows: 8859-7: # 2.0 version updates 1.0 version by adding mappings for the # three newly added characters 0xA4, 0xA5, 0xAA. 8859-8: # 1.1 version updates to the published 8859-8:1999, correcting # the mapping of 0xAF and adding mappings for LRM and RLM. 8859-10: # 1.1 corrected mistake in mapping of 0xA4 So I guess these mappings have changed since mbstring was first written. I'm not sure if there would be a backward-compatability problem if the mappings were changed. Thanks Dave Reproduce code: --------------- Code for this test is available at: http://davidheath.org/mbstring/mbstring_test.tar.bz2 Expected result: ---------------- Mappings as stated "expected xxx" above. Actual result: -------------- Mappings as stated "got xxx" above. PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Sat Oct 25 21:00:01 2025 UTC |
oops, minor bug in that script. Line 35 should read: printf(" incorrect mapping of char 0x%x: got 0x%x, expected 0x%x\n", $fromChar, $unicodeCharNumber[''], $expectChar); Corrected version of script for your cut+paste convenience: <?php testMapping('ISO-8859-7', array( 0xa4=>0x20ac, 0xa5=>0x20af, 0xaa=>0x37a) ); testMapping('ISO-8859-8', array( 0xaf=>0xaf, 0xfd=>0x200e, 0xfe=>0x200f) ); testMapping('ISO-8859-10', array( 0xa4=>0x12a ) ); function testMapping($targetEncoding, $map) { print "Encoding: $targetEncoding\n"; foreach($map as $fromChar=>$toChar) { $expectChar = $toChar; // convert to UCS-4, which represents every possible unicode // char as a single fixed width 32bit value $unicodeChar=mb_convert_encoding(chr($fromChar), 'UCS-4LE', $targetEncoding); $unicodeCharNumber = unpack('L', $unicodeChar); if ($expectChar!=$unicodeCharNumber[''] and ($expectChar!=0 and $unicodeCharNumber!=0x3f)) { printf(" incorrect mapping of char 0x%x: got 0x%x, expected 0x%x\n", $fromChar, $unicodeCharNumber[''], $expectChar); } } } ?>