php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #58159 UCS-2 conversions in transliterate() must specify LE
Submitted: 2008-04-15 15:06 UTC Modified: 2013-11-11 11:04 UTC
From: tech-php-bugs at sklar dot com Assigned: derick (profile)
Status: Closed Package: translit (PECL)
PHP Version: 5.2.5 OS: Solaris 10
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: tech-php-bugs at sklar dot com
New email:
PHP Version: OS:

 

 [2008-04-15 15:06 UTC] tech-php-bugs at sklar dot com
Description:
------------
transliterate() expects that the UCS-2 strings it operates on are UCS-2LE (little endian) encoding. If the iconv library that PHP is using assumes a default of big-endian encoding for the "ucs-2" encoding, then transliterate() doesn't work as expected. GNU libiconv-1.12 is one such library.




Reproduce code:
---------------
<?php
$s = 'Actualit'.chr(0303).chr(0251).'s';
print transliterate($s, array('diacritical_remove'), 'utf-8','ascii');
?>

Patch to fix this is to explicitly specify ucs-2le for the conversions inside transliterate():

diff -u translit-0.6.0/translit.c.orig translit-0.6.0/translit.c      
--- translit-0.6.0/translit.c.orig      Tue Apr  1 18:36:38 2008
+++ translit-0.6.0/translit.c   Tue Apr 15 18:53:53 2008
@@ -136,7 +136,7 @@
 
        str_len_i = str_len;
        if (charset_in_name && charset_in_len) {
-               php_iconv_string(string, (size_t) str_len_i, (char **) &in, &str_len_o, "ucs-2", charset_in_name);
+               php_iconv_string(string, (size_t) str_len_i, (char **) &in, &str_len_o, "ucs-2le", charset_in_name);
                efree_it = 1;
        } else {
                str_len_o = str_len_i;
@@ -170,7 +170,7 @@
                char *tmp_charset_name;
                spprintf((char**) &tmp_charset_name, 128, "%s//IGNORE", charset_out_name);
        
-               php_iconv_string((char *) out, (size_t) (outl * 2), (char **) &tmp, (size_t*) &tmp_len, tmp_charset_name, "ucs-2");
+               php_iconv_string((char *) out, (size_t) (outl * 2), (char **) &tmp, (size_t*) &tmp_len, tmp_charset_name, "ucs-2le");
                RETVAL_STRINGL((unsigned char *)tmp, tmp_len, 1);
                free(out);
                efree(tmp);

Expected result:
----------------
Actualites


Actual result:
--------------
Actualits

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-11-11 11:04 UTC] derick@php.net
Automatic comment from SVN on behalf of derick
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=332107
Log: Fixed bug #58159: UCS-2 conversions in transliterate() must specify LE.

Although I couldn't reproduce this, the patch and idea is sound.
 [2013-11-11 11:04 UTC] derick@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: derick
 [2013-11-11 11:04 UTC] derick@php.net
The fix for this bug has been committed.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.

I couldn't reproduce the problem though, but the idea and patch is sound.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 14:01:32 2024 UTC