php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #58159 UCS-2 conversions in transliterate() must specify LE
Submitted: 2008-04-15 15:06 UTC Modified: 2013-11-11 11:04 UTC
From: tech-php-bugs at sklar dot com Assigned: derick (profile)
Status: Closed Package: translit (PECL)
PHP Version: 5.2.5 OS: Solaris 10
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: tech-php-bugs at sklar dot com
New email:
PHP Version: OS:

 

 [2008-04-15 15:06 UTC] tech-php-bugs at sklar dot com
Description:
------------
transliterate() expects that the UCS-2 strings it operates on are UCS-2LE (little endian) encoding. If the iconv library that PHP is using assumes a default of big-endian encoding for the "ucs-2" encoding, then transliterate() doesn't work as expected. GNU libiconv-1.12 is one such library.




Reproduce code:
---------------
<?php
$s = 'Actualit'.chr(0303).chr(0251).'s';
print transliterate($s, array('diacritical_remove'), 'utf-8','ascii');
?>

Patch to fix this is to explicitly specify ucs-2le for the conversions inside transliterate():

diff -u translit-0.6.0/translit.c.orig translit-0.6.0/translit.c      
--- translit-0.6.0/translit.c.orig      Tue Apr  1 18:36:38 2008
+++ translit-0.6.0/translit.c   Tue Apr 15 18:53:53 2008
@@ -136,7 +136,7 @@
 
        str_len_i = str_len;
        if (charset_in_name && charset_in_len) {
-               php_iconv_string(string, (size_t) str_len_i, (char **) &in, &str_len_o, "ucs-2", charset_in_name);
+               php_iconv_string(string, (size_t) str_len_i, (char **) &in, &str_len_o, "ucs-2le", charset_in_name);
                efree_it = 1;
        } else {
                str_len_o = str_len_i;
@@ -170,7 +170,7 @@
                char *tmp_charset_name;
                spprintf((char**) &tmp_charset_name, 128, "%s//IGNORE", charset_out_name);
        
-               php_iconv_string((char *) out, (size_t) (outl * 2), (char **) &tmp, (size_t*) &tmp_len, tmp_charset_name, "ucs-2");
+               php_iconv_string((char *) out, (size_t) (outl * 2), (char **) &tmp, (size_t*) &tmp_len, tmp_charset_name, "ucs-2le");
                RETVAL_STRINGL((unsigned char *)tmp, tmp_len, 1);
                free(out);
                efree(tmp);

Expected result:
----------------
Actualites


Actual result:
--------------
Actualits

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-11-11 11:04 UTC] derick@php.net
Automatic comment from SVN on behalf of derick
Revision: http://svn.php.net/viewvc/?view=revision&amp;revision=332107
Log: Fixed bug #58159: UCS-2 conversions in transliterate() must specify LE.

Although I couldn't reproduce this, the patch and idea is sound.
 [2013-11-11 11:04 UTC] derick@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: derick
 [2013-11-11 11:04 UTC] derick@php.net
The fix for this bug has been committed.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.

 For Windows:

http://windows.php.net/snapshots/
 
Thank you for the report, and for helping us make PHP better.

I couldn't reproduce the problem though, but the idea and patch is sound.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Sep 11 17:02:22 2024 UTC