|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74230 iconv fails to fail on surrogates
Submitted: 2017-03-09 10:48 UTC Modified: 2017-03-27 18:24 UTC
From: paul dot crovella at gmail dot com Assigned: ab (profile)
Status: Closed Package: ICONV related
PHP Version: 7.1.2 OS: Windows
Private report: No CVE-ID: None
 [2017-03-09 10:48 UTC] paul dot crovella at gmail dot com
Surrogate codepoints cannot be validly encoded in UTF-8. Iconv should return false when given a byte sequence that would encode them and an in_charset of 'UTF-8'.

This works as expected on Linux but doesn't on Windows (tested with PHP 7.0.16 and 7.1.2.)

Test script:
$high = "\xED\xa1\x92"; // codepoint D852
$low = "\xED\xBD\xA2"; // codepoint DF62
$pair = $high.$low;
    @\iconv('UTF-8', 'UTF-8', $high) === false,
    @\iconv('UTF-8', 'UTF-8', $low) === false,
    @\iconv('UTF-8', 'UTF-8', $pair) === false

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2017-03-13 12:41 UTC]
Thanks for the report. This seems to be the libiconv issue. The library used is quite old, but surprisingly libiconv-1.15 was released a month ago after 6 years of silence :) I'm goint to check with it. Otherwise, in 7.1 you might get an acceptable result with sapi_windows_cp_conv().

 [2017-03-13 20:29 UTC] paul dot crovella at gmail dot com
Wow, and their very first bullet point for that release is:

 - The UTF-8 converter now rejects surrogates and out-of-range code points.

Talk about timing :D
 [2017-03-27 18:24 UTC]
-Status: Open +Status: Closed -Assigned To: +Assigned To: ab
 [2017-03-27 18:24 UTC]
The deps was updated with libiconv-1.15 for all 7.x branches. Currently you can fetch any latest master snap for the tests and OFC teh upcoming RCs this week.

PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Apr 15 07:01:28 2024 UTC