php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #75986 JSON parser not following preposed RFC
Submitted: 2018-02-20 14:00 UTC Modified: 2018-02-20 17:10 UTC
From: welfordmartin at gmail dot com Assigned:
Status: Not a bug Package: JSON related
PHP Version: 7.2.2 OS: Ubuntu Server 16.04 & Windows 10
Private report: No CVE-ID: None
 [2018-02-20 14:00 UTC] welfordmartin at gmail dot com
Description:
------------
When using json_decode to decode a string value containing "{" or "}" the validation fails and it does not produce an object this even happens when both are UTF-8 encoded with "\x7B" and "\x7D" respectfully. both cases cause a 'JSON_ERROR_SYNTAX',

While the JSON code has been manually checked against the RFC4627 and it says standard C strings well I can't believe I have to point this one out to you but using "{" and "}" are both ASCII chars so valid in a string.

The JSON also parses correctly in Web Browsers both Chrome, Edge & IE parsers.

JSON Code used:
{
	"defaultModel":"Campaign",
	"routes" : {
		"getCampaignsForWebsite":{
			"scope"	: "read_campaigns",
			"api":{
				"method" : "GET", 
				"uri" : "/campaign/\x7BdisplayType\x7D/\x7Bwebsite_id:\d+\x7D"
			},
			"perms":{
				"module" : "website",
				"requirement" : "read" 
			}
		},
		"getSingleCampaign":{
			"scope"	: "read_campaigns",
			"api":{
				"method" : "GET", 
				"uri" : "/campaign/\x7Bcampaign_id:\d+\x7D"
			},
			"perms":{
				"module" : "campaign", 
				"requirement" : "read" 
			}
		}
	}
}



Test script:
---------------
$raw = file_get_contents("test.json");
$json = json_decode($raw, true);
switch (json_last_error()) {
	case JSON_ERROR_NONE:
		echo ' - No errors';
	break;
	case JSON_ERROR_DEPTH:
		echo ' - Maximum stack depth exceeded';
	break;
	case JSON_ERROR_STATE_MISMATCH:
		echo ' - Underflow or the modes mismatch';
	break;
	case JSON_ERROR_CTRL_CHAR:
		echo ' - Unexpected control character found';
	break;
	case JSON_ERROR_SYNTAX:
		echo ' - Syntax error, malformed JSON';
	break;
	case JSON_ERROR_UTF8:
		echo ' - Malformed UTF-8 characters, possibly incorrectly encoded';
	break;
	default:
		echo ' - Unknown error';
	break;
}

Expected result:
----------------
I expect an array result when using assoc or a stdClass object when not.

Actual result:
--------------
Null and json_last_error results in JSON_ERROR_SYNTAX

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-02-20 14:07 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2018-02-20 14:07 UTC] nikic@php.net
Your JSON contains invalid escape sequences. \d is not valid JSON. \x7D is not valid JSON. The correct syntax is \\d and \u007D respectively.
 [2018-02-20 14:10 UTC] welfordmartin at gmail dot com
The JSON is valid according to 

https://www.freeformatter.com/json-validator.html

however, it fails on https://jsonlint.com/ and https://jsonformatter.curiousconcept.com/ while the latter says it's an invalid character at [Code 18, Structure 25] highlighting the UTF-8 chars as a problem so I think both of them are also not using a validator that does not conform to RFC4627
 [2018-02-20 14:22 UTC] nikic@php.net
Yes, the freeformatter.com checker is non-conforming. Please see the "char" production on page 4 of the cited RFC 4627.

For some more fun: http://seriot.ch/json/parsing.html Most JSON parsers are buggy in one way or another, so always take things with a grain of salt. To the best of our knowledge, the PHP 7 JSON parser is fully conforming (though the PHP 5 one is not).
 [2018-02-20 14:44 UTC] welfordmartin at gmail dot com
you mean page 5.

But even that says unescaped or escape one of 

%x22 /          ; "    quotation mark  U+0022
%x5C /          ; \    reverse solidus U+005C
%x2F /          ; /    solidus         U+002F
%x62 /          ; b    backspace       U+0008
%x66 /          ; f    form feed       U+000C
%x6E /          ; n    line feed       U+000A
%x72 /          ; r    carriage return U+000D
%x74 /          ; t    tab             U+0009
%x75 4HEXDIG )  ; uXXXX                U+XXXX

Since backslash d (\d) is not one of the listed escapes allowed it should be evaluated to unescaped (literal) "\" and "d"? as that is exactly what that standard. with unescaped or escape one of this list it is explicitly a / meaning or.

also, the UTF-8 encoded was tried after I mean to set it back to bracers before posting it in the bug.
 [2018-02-20 14:50 UTC] nikic@php.net
> Since backslash d (\d) is not one of the listed escapes allowed it should be evaluated to unescaped (literal) "\" and "d"? as that is exactly what that standard. with unescaped or escape one of this list it is explicitly a / meaning or.

No, "\" may be followed **only** by the characters listed there. Note that the "unescaped" production explicitly excludes "\".

> also, the UTF-8 encoded was tried after I mean to set it back to bracers before posting it in the bug.

Did you use both \\d and {} or \uXXXX escape sequences? If you still had \d the JSON would still be invalid.
 [2018-02-20 14:50 UTC] welfordmartin at gmail dot com
Also since JSON stands for JavaScript Object Notation it would stand to reason that all of the Javascript engines currently in use do this correctly and don't escape things that are not listed as an escape. so everything else implementing it should do that as well.
 [2018-02-20 15:02 UTC] welfordmartin at gmail dot com
>No, "\" may be followed **only** by the characters listed there. Note that the "unescaped" production explicitly excludes "\".

Please tell me where in that specification it states only it sates "unescaped / escape ( ... )" that in English means "unescaped or escape ( ... ) " and Javascript engines agree with that statement as they work in parsing that JSON. even IE.

Also to this note the specification of string on page 4,

>The representation of strings is similar to conventions used in the C
>family of programming languages.  A string begins and ends with
>quotation marks.  All Unicode characters may be placed within the
>quotation marks except for the characters that must be escaped:
>quotation mark, reverse solidus, and the control characters (U+0000 through U+001F) 

Since \d is not a valid escape it should be literal the same as the conventions in c family langs. most if not all including PHP fall back to string escape is not valid use literal.
 [2018-02-20 15:10 UTC] nikic@php.net
> Please tell me where in that specification it states only it sates "unescaped / escape ( ... )" that in English means "unescaped or escape ( ... ) " and Javascript engines agree with that statement as they work in parsing that JSON. even IE.

I'm sorry, I don't know how I can explain this to you. There is simply no way for this grammar to accept the sequence "\d". It does not match as "unescaped" because "\" is not allowed in unescaped. It does not match "escaped", because "\" cannot be followed by "d".

>The representation of strings is similar to conventions used in the C
>family of programming languages.  A string begins and ends with
>quotation marks.  All Unicode characters may be placed within the
>quotation marks except for the characters that must be escaped:
>quotation mark, reverse solidus, and the control characters (U+0000 through U+001F)

This quite explicitly says "[...] except for the characters that must be escaped: [...] reverse solidus [...]". "\" is a reverse solidus. "\" must be escaped.

> Since \d is not a valid escape it should be literal the same as the conventions in c family langs. most if not all including PHP fall back to string escape is not valid use literal.

It's nice that you think this way, but this is not what the JSON specification says and consequently not what PHP does.

Also, your claim that JavaScript implementations follow the behavior you describe is incorrect. If you write JSON.parse("\"\\d\"") you will receive a syntax error. You need to use JSON.parse("\"\\\\d\"") instead.

What you probably tried is to simply parse the JSON as JS, which has entirely different and much less strict rules. If you want to parse JS, please use a JS parser, not a JSON parser.
 [2018-02-20 16:21 UTC] welfordmartin at gmail dot com
If that is the case the PHP parse is still incorrect as "/" is working without escaping and is listed on the escape list as

>%x2F /          ; /    solidus         U+002F

so it should force me to escape to use a forward slash "\/" (as json_encode does)?

also sorry I have just opened up the HTML version of that RFC to see it's been outdated twice now, the Current standard is: https://tools.ietf.org/html/rfc8259
 [2018-02-20 17:10 UTC] nikic@php.net
> If that is the case the PHP parse is still incorrect as "/" is working without escaping and is listed on the escape list as
>
> >%x2F /          ; /    solidus         U+002F
>
> so it should force me to escape to use a forward slash "\/" (as json_encode does)?

This just means that writing "\/" is permitted, not that it is required. The important part is what is in the "unescape" production: It says "%x20-21 / %x23-5B / %x5D-10FFFF" which basically means "everything apart from control characters, the double quote (%x22) and \ (%x5C)". Those are the only things that *must* be escaped, everything else *can* be escaped using either one of the predefined escape sequences or the general \uXXXX escape sequence.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Dec 04 03:01:30 2024 UTC