|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #48147 iconv with //IGNORE cuts the string
Submitted: 2009-05-04 14:52 UTC Modified: 2015-05-08 07:23 UTC
Avg. Score:4.3 ± 0.8
Reproduced:10 of 12 (83.3%)
Same Version:8 (80.0%)
Same OS:7 (70.0%)
From: kulakov74 at yandex dot ru Assigned: stas (profile)
Status: Closed Package: ICONV related
PHP Version: 5.*, 6CVS (2009-05-05) OS: Linux
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
Solve the problem:
33 + 25 = ?
Subscribe to this entry?

 [2009-05-04 14:52 UTC] kulakov74 at yandex dot ru
iconv() without //IGNORE as known cuts the string at the first illegal character, but with //IGNORE it should not. Still, I get a truncated text, but not at the point where the character is. Sorry the actual PHP version is 5.2.6, but I cannot upgrade it. Just to let you know. Can you test that with the last version? Please download the file from

Reproduce code:
$Body1=... //read the file

$Body2=iconv('UTF-8', 'ISO-8859-1', $Body1);

$Body2=iconv('UTF-8', 'ISO-8859-1//IGNORE', $Body1);

Expected result:
Notice: iconv(): Detected an illegal character in input string in /home/doldon/html/tdnam/dev.php on line 18
15321 - I can get this if I use //TRANSLIT or when I run the test on my home Windows PHP 4

Actual result:
Notice: iconv(): Detected an illegal character in input string in /home/doldon/html/tdnam/dev.php on line 18
Notice: iconv(): Detected an illegal character in input string in /home/doldon/html/tdnam/dev.php on line 18


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2009-05-06 05:13 UTC] kulakov74 at yandex dot ru
Here goes the script. I'm not sure about the limit on external resources - I have the file to convert, so it is downloaded. 




$Body2=iconv('UTF-8', 'ISO-8859-1', $Body1);

$Body2=iconv('UTF-8', 'ISO-8859-1//IGNORE', $Body1);

 [2009-05-06 14:38 UTC]
It just means you're using glibc iconv implementation which does not 
have the IGNORE parameter implemented.
 [2009-05-06 18:18 UTC] kulakov74 at yandex dot ru
No. The fact the script displays the notice "iconv(): Detected an illegal character ..." in both cases is not related to the fact whether the option is implemented: this is controlled by error_reporting(E_ALL). The option IGNORE only controls whether iconv will stop at the character or not. 

Also, the length of the resulting string is different (greater) with IGNORE, and while without it the string ends at exactly where the illegal character is, with IGNORE it ends at a random point where no such characters occur. 

Also, I did not mention - this is not the only file I converted, many others were converted correctly with the option, and their length only decreased a little. But there were 2 files which were truncated, 1 of them (the smaller) is used for the test case. 

Can you run the test with the latest PHP releases? Actually this is why I reported the bug. I tried it on other servers with PHP 4.3.3, 5.1.4, 5.1.6, 5.2.4 and 5.2.6 and yep! - I finally found one with 5.2.9 (built Feb 27 2009) and it displayed the same results everywhere. 

I repeat, the TRANSLIT option works fine, while it does the same and even more.
 [2009-05-06 18:36 UTC]
Arnaud: Please don't reopen bogus bugs without explanation. 
 [2009-05-07 07:50 UTC]
Marked it as verified as I got exactly the same results:

The first iconv() call (the one without //IGNORE) fails on the emphasis character "…" (value="Search…"), which can't be represented in ISO-8859-1.

The second iconv() call (the one with //IGNORE) fails later (so the emphasis is ignored, which may means that the //IGNORE flag is supported), and there is no apparent reason for failing at offset 8157 (only regular ASCII chars around).
 [2009-05-07 13:58 UTC]
We still can't fix bugs in glibc iconv implementation. Try this on 
command line and you get same results:

# iconv -f utf-8 -t iso-8859-1 iconv.html > /dev/null
iconv: illegal input sequence at position 3589

# iconv -f utf-8 -t iso-8859-1//IGNORE iconv.html > /dev/null
iconv: illegal input sequence at position 8168

 [2011-12-18 19:37 UTC]
Not broken in latest version of libiconv

ezyang@javelin:~/Desktop/libiconv-1.14/src$ ./iconv_no_i18n --version
iconv (GNU libiconv 1.14)
Copyright (C) 2000-2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Bruno Haible.
ezyang@javelin:~/Desktop/libiconv-1.14/src$ ./iconv_no_i18n -f utf-8 -t iso-8859-1//IGNORE ~/iconv.html | wc -c
ezyang@javelin:~/Desktop/libiconv-1.14/src$ iconv -f utf-8 -t iso-8859-1//IGNORE ~/iconv.html | wc -c
iconv: illegal input sequence at position 8168
 [2011-12-23 00:49 UTC]
-Status: Bogus +Status: Re-Opened
 [2011-12-23 00:49 UTC]
I think I understand how to fix this bug, without modifying glibc. We need to modify our invocation of iconv in order to mirror the behavior of iconv_prog.c:process_block() when the '-c' flag is set (if we mimic the code closely enough, we also get an extra bonus of sensible block processing behavior, which is better than the horrible over-allocation iconv does right now). In particular, we need to handle the EILSEQ error code correctly.
 [2012-01-08 12:33 UTC]
-Status: Re-Opened +Status: Feedback
 [2012-01-08 12:33 UTC]
To me it looks like there is no bug (as stated in the redhat issues). Also even if 
there was one, it would not be a PHP bug but iconv's.

Or do you have any information that shows that PHP is causing this problem here?
 [2012-10-27 09:26 UTC]
I submitted an updated bug to glibc, which correctly describes the incorrect behavior in glibc

The facts of the matter are as follows:

1) glibc has inconsistent behavior about what the EILSEQ error code is supposed to mean, between its documentation and its behavior
2) glibc and libiconv have different behavior
3) A user of PHP who would like to use iconv to convert between two character sets while ignoring malformed characters *cannot do so* with the most recent versions of PHP (5.4+). (Trust me, I've tried.) In old versions of PHP, this functionality was available. Thus, this bug is a regression.

If you want to blame upstream, that's fine by me, but I'm not optimistic on glibc getting updated any time in the near future, and there is a well understood (and implemented elsewhere) fix which gives us the correct behavior.
 [2013-02-18 00:33 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.
 [2015-05-08 07:22 UTC]
-Assigned To: +Assigned To: stas
 [2015-05-08 07:23 UTC]
-Status: No Feedback +Status: Assigned
 [2015-05-10 02:29 UTC]
Automatic comment on behalf of stas
Log: Fix #48147 - implement manual handling of  //IGNORE for broken libc
 [2015-05-10 02:29 UTC]
-Status: Assigned +Status: Closed
 [2015-05-10 02:29 UTC]
Automatic comment on behalf of stas
Log: Fix #48147 - implement manual handling of  //IGNORE for broken libc
 [2016-07-20 11:38 UTC]
Automatic comment on behalf of stas
Log: Fix #48147 - implement manual handling of  //IGNORE for broken libc
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Thu Mar 04 02:01:24 2021 UTC