|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #81362 iconv UTF-8//IGNORE fails to strip "\xF5\x80\x80\x80"
Submitted: 2021-08-15 15:34 UTC Modified: 2021-08-16 12:44 UTC
From: divinity76 at gmail dot com Assigned:
Status: Open Package: ICONV related
PHP Version: 8.0.9 OS:
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2021-08-15 15:34 UTC] divinity76 at gmail dot com
it does reproduce on "PHP8.0.3 + iconv 2.28 + Ubuntu 20.04",
it also reproduce on everywhere from PHP5.6.0 to PHP8.0.9 inclusive, but unknown iconv/OS:

(the string also appears to trigger a bug in mb_check_encoding() that was fixed in 5.6.0?)

interestingly it does *not* reproduce on "PHP7.3.7-for-cygwin + iconv 1.0 + Windows 10", there it correctly strip the string down to string(3) "PHP"

it's possible that it's the result of a bug introduced sometime after iconv 1.0 and <= iconv 2.28?

Test script:


Expected result:
string(3) "PHP"
string(6) "504850"

Actual result:
string(7) "����PHP"
string(14) "f5808080504850"


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2021-08-15 15:42 UTC]
-Type: Bug +Type: Documentation Problem
 [2021-08-15 15:42 UTC]
The latest GNU libiconv version is 1.16[1].  The iconv()
documentation[2] already warns about //TRANSLIT being
implementation specific; the same applies to //IGNORE.  I suggest
that you always build against GNU libiconv; system implementations
are known to be limited (or even buggy).

So no, not an implementation bug, but rather a doc issue.

[1] <>
[2] <>
 [2021-08-15 16:02 UTC] divinity76 at gmail dot com
@cmb that's interesting, any idea where the builds get their "iconv version 2.28" from?

but it's not just that they ignore the //IGNORE directive, they understand the "//IGNORE" directive and yet doesn't strip it. for example they all correctly strips "\xf5\x80\x80PHP" down to "PHP", so the //IGNORE isn't being ignored.
 [2021-08-16 12:44 UTC]
Presumably iconv (or at least this iconv implementation) doesn't consider encoded codepoints > U+10FFFF to be invalid.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Jul 18 23:01:30 2024 UTC