php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #30549 incorrect character translations for some ISO-8859 charsets
Submitted: 2004-10-25 09:53 UTC Modified: 2005-02-21 08:50 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: david at davidheath dot org Assigned: moriyoshi (profile)
Status: Closed Package: mbstring related
PHP Version: 4.3.9 OS: linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: david at davidheath dot org
New email:
PHP Version: OS:

 

 [2004-10-25 09:53 UTC] david at davidheath dot org
Description:
------------
MBstring appears to incorrectly map some characters for the following ISO-8859 charsets, as follows:

Encoding: ISO-8859-7
  incorrect mapping of char 0xa4: got 0x3f, expected 0x20ac
  incorrect mapping of char 0xa5: got 0x3f, expected 0x20af
  incorrect mapping of char 0xaa: got 0x3f, expected 0x37a
Encoding: ISO-8859-8
  incorrect mapping of char 0xaf: got 0x203e, expected 0xaf
  incorrect mapping of char 0xfd: got 0x3f, expected 0x200e
  incorrect mapping of char 0xfe: got 0x3f, expected 0x200f
Encoding: ISO-8859-10
  incorrect mapping of char 0xa4: got 0x124, expected 0x12a

This is based on the mappings provided at ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/ on 25th Oct 2004. 

Note, there are undated comments in the "Version history" for the above files, as follows:

8859-7:
#	2.0 version updates 1.0 version by adding mappings for the
#	three newly added characters 0xA4, 0xA5, 0xAA.

8859-8:
#       1.1 version updates to the published 8859-8:1999, correcting
#          the mapping of 0xAF and adding mappings for LRM and RLM.

8859-10:
#       1.1 corrected mistake in mapping of 0xA4

So I guess these mappings have changed since mbstring was first written. I'm not sure if there would be a backward-compatability problem if the mappings were changed.

Thanks

Dave


Reproduce code:
---------------
Code for this test is available at:

http://davidheath.org/mbstring/mbstring_test.tar.bz2


Expected result:
----------------
Mappings as stated "expected xxx" above.

Actual result:
--------------
Mappings as stated "got xxx" above.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-10-25 10:33 UTC] derick@php.net
Hello David,

can you please make a *short* script that show that the warnings are wrong as it takes quite some time to figure out what your script is exactly doing.

regards,
Derick
 [2004-10-25 13:32 UTC] david at davidheath dot org
oops, minor bug in that script. Line 35 should read:

            printf("  incorrect mapping of char 0x%x: got 0x%x, expected 0x%x\n", $fromChar, $unicodeCharNumber[''], $expectChar);

Corrected version of script for your cut+paste convenience:

<?php

testMapping('ISO-8859-7',
            array(
                0xa4=>0x20ac,
                0xa5=>0x20af,
                0xaa=>0x37a)
            );

testMapping('ISO-8859-8',
            array(
                0xaf=>0xaf,
                0xfd=>0x200e,
                0xfe=>0x200f)
            );

testMapping('ISO-8859-10',
            array(
                0xa4=>0x12a
                )
            );

function testMapping($targetEncoding, $map) {
    print "Encoding: $targetEncoding\n";

    foreach($map as $fromChar=>$toChar) {
        $expectChar = $toChar;

        // convert to UCS-4, which represents every possible unicode
        // char as a single fixed width 32bit value
        $unicodeChar=mb_convert_encoding(chr($fromChar), 'UCS-4LE', $targetEncoding);
        $unicodeCharNumber = unpack('L', $unicodeChar);
        
        if ($expectChar!=$unicodeCharNumber[''] and ($expectChar!=0 and $unicodeCharNumber!=0x3f)) {
            printf("  incorrect mapping of char 0x%x: got 0x%x, expected 0x%x\n", $fromChar, $unicodeCharNumber[''], $expectChar);
        }
    }
}
?>
 [2005-02-11 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2005-02-17 15:09 UTC] moriyoshi@php.net
Verified.
 [2005-02-21 08:50 UTC] moriyoshi@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri Jan 31 13:01:29 2025 UTC