|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #46944 json_encode chokes on characters outside the BMP
Submitted: 2008-12-26 15:39 UTC Modified: 2009-01-02 03:05 UTC
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: anomie at users dot sourceforge dot net Assigned: scottmac
Status: Closed Package: JSON related
PHP Version: 5.3CVS-2008-12-26 (snap) OS: Linux
Private report: No CVE-ID:
 [2008-12-26 15:39 UTC] anomie at users dot sourceforge dot net
json_encode encodes characters above U+1FFFF incorrectly; sometimes it incorrectly encodes them as characters in the U+10000-U+1FFFF range, and sometimes it just errors out.

Note this is not an error with the source not being UTF8; as you can see below, I am building the UTF8-encoded text byte-by-byte.

5.2.6 has the same problem, although instead of null it returns "aa" for those cases due to bug 43941.

It looks like there are actually two unrelated bugs here:
1. utf8_to_utf16 in ext/json/utf8_to_utf16.c should use "c -= 0x10000;" at line 49 instead of "c &= 0xFFFF;". This causes the part where it incorrectly encodes values over U+1FFFF as U+10000-U+1FFFF.
2. utf8_decode_next in ext/json/utf8_decode.c should use 0xF8 instead of 0xF1 at line 168. This causes the part where UTF8 characters beginning with an F1 or F3 byte error out.

Reproduce code:
for($i=1; $i<=16; $i++){
    print json_encode("aa".chr(0xf0|($i>>2)).chr(0x8f|($i&3)<<4)."\xbf\xbdzz")."\n";

Expected result:

Actual result:


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2009-01-02 03:05 UTC]
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
Thank you for the report, and for helping us make PHP better.

PHP Copyright © 2001-2015 The PHP Group
All rights reserved.
Last updated: Fri Nov 27 06:02:02 2015 UTC