php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #16069 ICONV transliteration failure
Submitted: 2002-03-14 09:40 UTC Modified: 2002-08-03 20:25 UTC
From: readjust at deneb dot freemail dot ne dot jp Assigned:
Status: Closed Package: ICONV related
PHP Version: 4.3.0-dev OS: win32, Linux
Private report: No CVE-ID: None
 [2002-03-14 09:40 UTC] readjust at deneb dot freemail dot ne dot jp
conversion between CP932(a variant of Shift_JIS charset) and any Japanese charset other than CP932 unexpectantly failed when transliteration mode is specified like "EUC-JP//TRANSLIT" on the output encoding and the transliteration requires some larger buffer than strlen(input_buf) * sizeof(ucs4_t).

testing script:
<?php
for( $i = 0; $i < 20; ++$i ) {
  print $i.":".iconv( "EUC-JP", "Shift_JIS", iconv( "CP932", "EUC-JP//TRANSLIT", "abcd".str_repeat( "****", $i ) ) )."<BR>";
}
for( $i = 0; $i < 20; ++$i ) {
  print $i.":".iconv( "EUC-JP", "Shift_JIS", iconv( "CP932", "EUC-JP//TRANSLIT", "abcd".str_repeat( "++++", $i ) ) )."<BR>";
}
?>

where "****" is ONE character described as "SQUARE MIRIBAARU" (0x876D) and "++++" is ONE character described as "SQUARE AARU" (0x8765) on http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-03-14 20:56 UTC] yohgaki@php.net
I've fixed it whole ago for systems supports iconv in libc.
(Recent Linux/glibc is one of them)

For systems uses libiconv, there is problem still.
(I didn't fix problem with libiconv, since I don't use libiconv ;)

 [2002-03-15 03:21 UTC] readjust at deneb dot freemail dot ne dot jp
Yes, I know I am a exception :) and most PHP user don't have to care about this problem. Besides it seems that glibc ICONV can not handle CP932 characters properly.
 [2002-05-24 20:28 UTC] derick@php.net
This bug has been fixed in CVS. You can grab a snapshot of the
CVS version at http://snaps.php.net/. In case this was a documentation 
problem, the fix will show up soon at http://www.php.net/manual/.
In case this was a PHP.net website problem, the change will show
up on the PHP.net site and on the mirror sites.
Thank you for the report, and for helping us make PHP better.


 [2002-05-25 07:04 UTC] derick@php.net
User reports:

But I think this problem was not completely solved yet because
libiconv support code is still old.
I confirmed it by running my testing script I've shown in #16069 on the binary
base on the latest cvs version.
I can only choose libiconv when I work with Windows PHP.
Or Should I wonder if the support was done ?

As far as I know, libiconv can handle errno even in its win32 port.

Thanks,

Moriyoshi Koizumi

 [2002-05-26 14:50 UTC] derick@php.net
Does this now work as expected in PHP 4.2.1 ?

 [2002-05-26 19:59 UTC] readjust at deneb dot freemail dot ne dot jp
No, it doesn't for now, at least in my box. I tried with 4.3.0-dev and it resulted in segv. It seems because of buffer over run.
 [2002-05-27 18:21 UTC] yohgaki@php.net
I think when recent Linux is used, there should be no such problem.
i.e. When iconv is supported by libc, length of resulting string is correct. (Requires 4.2.0 or later)

When libiconv is used, there will be problem since it just allocate org_string_len*sizeof(UCS4 width) which cannot be
corret. (And inefficient memory use)

So please note which iconv you're using. I may fix if libc iconv support is broken.

PS: If errno is supported for libiconv (errno should be supported for all platforms we support), we just can get rid of code for libiconv support and use libc iconv support.



 [2002-05-27 23:15 UTC] readjust at deneb dot freemail dot ne dot jp
Sorry for bothering you again. I don't want to make things more complicated because I have no problem using PHP at the moment.
Yasuo's libc iconv support looks quite o.k.
I should say in some cases I ought to use libiconv as I've mentioned it before. libc iconv doesn't translit CP932 specific characters into  the alternatives while converting them into other Japanese charset. So if you are dealing with libc iconv, you don't have to care about this for the first.
errno in a kind of MSVCRT is handled as a thread local variable like glibc.
 [2002-07-05 14:45 UTC] sniper@php.net
Doesn't segfault with 4.3.0-dev. (linux) 
I guess this can be closed?

 [2002-07-05 16:39 UTC] readjust at deneb dot freemail dot ne dot jp
I have just tried again with the HEAD.

--------------------------------
$ ./buildconf
$ ./configure --with-iconv=/usr/lib \
 --prefix=/home/koizumi/local
$ make
$ make install
$ /home/koizumi/local/bin/php -q test.php

(13 lines printed)

Segmentation fault
--------------------------------
Backtrace:

#0  0x401f2e8f in chunk_free (ar_ptr=0x402a6620, p=0x1c775d9f) at malloc.c:3225
#1  0x401f2bf4 in __libc_free (mem=0x81d1a48) at malloc.c:3154
#2  0x40035602 in libiconv_close () from /usr/lib/libiconv.so.2
#3  0x08062fe9 in php_iconv_string (in_p=0x81d17cc "", in_len=32,
    out=0xbfffd060, out_len=0xbfffd064, in_charset=0x81d66b4 "CP932",
    out_charset=0x81d0db4 "EUC-JP//TRANSLIT", err=0xbfffd068)
    at /home/koizumi/src/php4/ext/iconv/iconv.c:194
#4  0x0806308f in php_if_iconv (ht=3, return_value=0x81d0d24, this_ptr=0x0,
    return_value_used=1) at /home/koizumi/src/php4/ext/iconv/iconv.c:292

(gdb) select-frame 3
(gdb) print out_size
$1 = 144
(gdb) print out_left
$2 = 0
--------------------------------

Now I realized that what is to blame for segv.

------------------------------------------
    *out_len = out_size - out_left;
    out_buffer[*out_len] = '\0';
    icv_close(cd);
------------------------------------------

Here's the patch.
(I'll resend this one if you think it better)

===================================================================
RCS file: /repository/php4/ext/iconv/iconv.c,v
retrieving revision 1.39
diff -u -r1.39 iconv.c
--- iconv.c     28 Jun 2002 07:12:32 -0000      1.39
+++ iconv.c     5 Jul 2002 20:35:50 -0000
@@ -164,7 +164,7 @@
          I added 15 extra bytes for safety. <yohgaki@php.net>
        */
        out_size = in_len * sizeof(ucs4_t) + 16;
-       out_buffer = (char *) emalloc(out_size);
+       out_buffer = (char *) emalloc(out_size+1);

        *out = out_buffer;
        out_p = out_buffer;
==============================================

THIS PATCH DOESN'T REALLY FIX THE BUG #16069.
 [2002-07-06 19:28 UTC] sniper@php.net
Not fixed. (updated version too)

 [2002-07-10 23:32 UTC] yohgaki@php.net
This bug has been fixed in CVS. You can grab a snapshot of the
CVS version at http://snaps.php.net/. In case this was a documentation 
problem, the fix will show up soon at http://www.php.net/manual/.
In case this was a PHP.net website problem, the change will show
up on the PHP.net site and on the mirror sites.
Thank you for the report, and for helping us make PHP better.


 [2002-08-03 09:34 UTC] msopacua at idg dot nl
If the test is correct, this still fails on BSD/OS.

Testlog submitted to Yohgaki, to preserve charset.
 [2002-08-03 20:25 UTC] yohgaki@php.net
IIRC, libiconv is needed to perform transit with CP932.
 [2002-08-04 05:46 UTC] readjust at deneb dot freemail dot ne dot jp
no need to mind if php is built with the iconv library that doesn't properly handle cp932(especially glibc iconv).
but in such case it should have been SKIPed rather than FAILed. posibble unit test's bug.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 08:01:29 2024 UTC