php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30549 incorrect character translations for some ISO-8859 charsets
Submitted: 2004-10-25 09:53 UTC Modified: 2005-02-21 08:50 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: david at davidheath dot org Assigned: moriyoshi
Status: Closed Package: mbstring related
PHP Version: 4.3.9 OS: linux
Private report: No CVE-ID:
 [2004-10-25 09:53 UTC] david at davidheath dot org
Description:
------------
MBstring appears to incorrectly map some characters for the following ISO-8859 charsets, as follows:

Encoding: ISO-8859-7
  incorrect mapping of char 0xa4: got 0x3f, expected 0x20ac
  incorrect mapping of char 0xa5: got 0x3f, expected 0x20af
  incorrect mapping of char 0xaa: got 0x3f, expected 0x37a
Encoding: ISO-8859-8
  incorrect mapping of char 0xaf: got 0x203e, expected 0xaf
  incorrect mapping of char 0xfd: got 0x3f, expected 0x200e
  incorrect mapping of char 0xfe: got 0x3f, expected 0x200f
Encoding: ISO-8859-10
  incorrect mapping of char 0xa4: got 0x124, expected 0x12a

This is based on the mappings provided at ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/ on 25th Oct 2004. 

Note, there are undated comments in the "Version history" for the above files, as follows:

8859-7:
#	2.0 version updates 1.0 version by adding mappings for the
#	three newly added characters 0xA4, 0xA5, 0xAA.

8859-8:
#       1.1 version updates to the published 8859-8:1999, correcting
#          the mapping of 0xAF and adding mappings for LRM and RLM.

8859-10:
#       1.1 corrected mistake in mapping of 0xA4

So I guess these mappings have changed since mbstring was first written. I'm not sure if there would be a backward-compatability problem if the mappings were changed.

Thanks

Dave


Reproduce code:
---------------
Code for this test is available at:

http://davidheath.org/mbstring/mbstring_test.tar.bz2


Expected result:
----------------
Mappings as stated "expected xxx" above.

Actual result:
--------------
Mappings as stated "got xxx" above.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-10-25 10:33 UTC] derick@php.net
Hello David,

can you please make a *short* script that show that the warnings are wrong as it takes quite some time to figure out what your script is exactly doing.

regards,
Derick
 [2004-10-25 13:32 UTC] david at davidheath dot org
oops, minor bug in that script. Line 35 should read:

            printf("  incorrect mapping of char 0x%x: got 0x%x, expected 0x%x\n", $fromChar, $unicodeCharNumber[''], $expectChar);

Corrected version of script for your cut+paste convenience:

<?php

testMapping('ISO-8859-7',
            array(
                0xa4=>0x20ac,
                0xa5=>0x20af,
                0xaa=>0x37a)
            );

testMapping('ISO-8859-8',
            array(
                0xaf=>0xaf,
                0xfd=>0x200e,
                0xfe=>0x200f)
            );

testMapping('ISO-8859-10',
            array(
                0xa4=>0x12a
                )
            );

function testMapping($targetEncoding, $map) {
    print "Encoding: $targetEncoding\n";

    foreach($map as $fromChar=>$toChar) {
        $expectChar = $toChar;

        // convert to UCS-4, which represents every possible unicode
        // char as a single fixed width 32bit value
        $unicodeChar=mb_convert_encoding(chr($fromChar), 'UCS-4LE', $targetEncoding);
        $unicodeCharNumber = unpack('L', $unicodeChar);
        
        if ($expectChar!=$unicodeCharNumber[''] and ($expectChar!=0 and $unicodeCharNumber!=0x3f)) {
            printf("  incorrect mapping of char 0x%x: got 0x%x, expected 0x%x\n", $fromChar, $unicodeCharNumber[''], $expectChar);
        }
    }
}
?>
 [2005-02-11 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2005-02-17 15:09 UTC] moriyoshi@php.net
Verified.
 [2005-02-21 08:50 UTC] moriyoshi@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Mon Apr 21 04:01:57 2014 UTC