php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #67055 output of json_encode does not comply with the definition
Submitted: 2014-04-10 12:50 UTC Modified: 2014-04-25 16:51 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: mail+bugs dot php dot net at kazik dot de Assigned:
Status: Not a bug Package: JSON related
PHP Version: Irrelevant OS: All
Private report: No CVE-ID: None
 [2014-04-10 12:50 UTC] mail+bugs dot php dot net at kazik dot de
Description:
------------
The function json_encode does not encode strings accordingly to the json definition. It affects at least php since 5.3.28 up to the latest version (currently 5.5.11).

According to the definition a string may not contain a quote, slash or control character.
Control characters are the c0 set (0x00-0x1f), delete (0x7f) and the c1 set (0x80-0x9f) (see http://en.wikipedia.org/wiki/Unicode_control_characters).

Source: ext/json/json.c, function json_escape_string

The function only checks for the c0 set but does not handle delete and the c1 set correctly.

The c1 set bug is only affected with the option JSON_UNESCAPED_UNICODE (since php 5.4.0).


Test script:
---------------
echo json_encode(chr(0x7f)).PHP_EOL;

echo json_encode(chr(0xc2).chr(0x80)), JSON_UNESCAPED_UNICODE).PHP_EOL; // the utf8 representation of \u0080


Expected result:
----------------
"\u007f"

"\u0080"


Actual result:
--------------
'"'.chr(0x7f).'"'

'"'.chr(0xc2).chr(0x80).'"' // the utf8 representation of /u0080


Patches

patch_json.diff (last revision 2014-04-10 12:51 UTC by mail+bugs dot php dot net at kazik dot de)

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-04-25 01:22 UTC] pleasestand at live dot com
> The function json_encode does not encode strings accordingly to the json definition.

In RFC 7159, section "7. Strings" defines the "control characters" as only "U+0000 through U+001F". More specifically, the characters you mention are explicitly allowed to be left unescaped:

    unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

Note that JavaScript's JSON.stringify() doesn't escape "\u007f" either.
 [2014-04-25 16:51 UTC] aharvey@php.net
-Status: Open +Status: Not a bug
 [2014-04-25 16:51 UTC] aharvey@php.net
Yep; this is correct per the JSON spec.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 11:01:31 2024 UTC