go to bug id or search bugs for
Surrogate codepoints cannot be validly encoded in UTF-8. Iconv should return false when given a byte sequence that would encode them and an in_charset of 'UTF-8'.
This works as expected on Linux https://3v4l.org/Ff6cr but doesn't on Windows (tested with PHP 7.0.16 and 7.1.2.)
$high = "\xED\xa1\x92"; // codepoint D852
$low = "\xED\xBD\xA2"; // codepoint DF62
$pair = $high.$low;
@\iconv('UTF-8', 'UTF-8', $high) === false,
@\iconv('UTF-8', 'UTF-8', $low) === false,
@\iconv('UTF-8', 'UTF-8', $pair) === false
Add a Patch
Add a Pull Request
Thanks for the report. This seems to be the libiconv issue. The library used is quite old, but surprisingly libiconv-1.15 was released a month ago after 6 years of silence :) I'm goint to check with it. Otherwise, in 7.1 you might get an acceptable result with sapi_windows_cp_conv().
Wow, and their very first bullet point for that release is:
- The UTF-8 converter now rejects surrogates and out-of-range code points.
Talk about timing :D
The deps was updated with libiconv-1.15 for all 7.x branches. Currently you can fetch any latest master snap for the tests and OFC teh upcoming RCs this week.