php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45989 json_decode() passes through certain invalid JSON strings
Submitted: 2008-09-04 00:32 UTC Modified: 2009-07-27 03:44 UTC
Votes:20
Avg. Score:4.2 ± 0.8
Reproduced:18 of 19 (94.7%)
Same Version:16 (88.9%)
Same OS:3 (16.7%)
From: steven at acko dot net Assigned:
Status: Closed Package: JSON related
PHP Version: 5.2.6 OS: Mac OS X
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: steven at acko dot net
New email:
PHP Version: OS:

 

 [2008-09-04 00:32 UTC] steven at acko dot net
Description:
------------
When json_decode() is given certain invalid JSON strings, it will return 
the literal string as the result, rather than returning NULL.

Note: in #38680, the decision was made to allow json_decode() to accept 
literal basic types (strings, ints, ...) even though this is not allowed 
by RFC 4627 (which only allows objects/arrays). This bug report is 
different because even under the PHP interpretation of JSON, these 
strings can not be considered valid, and trivial variations on them do 
in fact throw an error as expected.

(The non-standard behaviour introduced in #38680 is not documented at 
all by the way, which is kind of ironic given the numerous issues that 
have 'go read the spec' as the answer)




Reproduce code:
---------------
var_dump(json_decode("'invalid json'"));
var_dump(json_decode('invalid json'));
var_dump(json_decode(' {'));
var_dump(json_decode(' ['));



Expected result:
----------------
NULL
NULL
NULL
NULL

Actual result:
--------------
string(14) "'invalid json'"
string(12) "invalid json"
string(2) " {"
string(2) " ["






Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-09-10 00:38 UTC] iliaa@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

.
 [2008-09-10 01:14 UTC] steven at acko dot net
Please clarify the bogus classification.

The following each returns NULL, as expected:

var_dump(json_decode('['));             // unmatched bracket
var_dump(json_decode('{'));             // unmatched brace
var_dump(json_decode('{}}'));           // unmatched brace
var_dump(json_decode('{error error}')); // invalid object key/value notation
var_dump(json_decode('["\"]'));         // unclosed string
var_dump(json_decode('[" \x "]'));      // invalid escape code

Yet the following each returns the literal argument as a string:

var_dump(json_decode(' ['));
var_dump(json_decode(' {'));
var_dump(json_decode(' {}}'));
var_dump(json_decode(' {error error}')); 
var_dump(json_decode('"\"'));
var_dump(json_decode('" \x "')); 

Please examine the examples closely: they are all meaningless, invalid JSON. Even under the 
most widely stretched definition of JSON, the above is not JSON encoded data. Yet 
json_decode() arbitarily returns /some of it/ as a string... and in a way that looks 
suspiciously like a bad parser implementation.

If this was merely a case of json_decode() returning /all/ invalid json as is, then it could 
be classified as an implementation quirk. But because of how inconsistent it is now, you 
can't say that it is by design or following any kind of spec.

E.g. how would you currently see if json_decode() succeeded or not?
 [2008-11-17 15:23 UTC] till@php.net
@Iliaa:

Could this bug be re-evaluated or a more detailed explaination as of why the docs sometimes note that "NULL" is returned on invalid json, and why sometimes json_decode() returns the string instead?

If the function returns "whatever" then the docs should be updated to tell the user to not rely on what is returned by json_decode at all. ;-)

I double-checked some of Steve's examples on jsonlint.com (which is in most docs cited as the reference validator for json data) and they all show up as "invalid".

I also build the most recent 5.2.7 snapshot:
./configure --disable-all --enable-json

till@till-laptop:~/php5.2-200811171330$ ./sapi/cli/php test-45989.php 
string(14) "'invalid json'"
string(12) "invalid json"
string(2) " {"
string(2) " ["
till@till-laptop:~/php5.2-200811171330$ ./sapi/cli/php --ini
Configuration File (php.ini) Path: /usr/local/lib
Loaded Configuration File:         (none)
Scan for additional .ini files in: (none)
Additional .ini files parsed:      (none)
till@till-laptop:~/php5.2-200811171330$ ./sapi/cli/php -m
[PHP Modules]
date
json
Reflection
standard

[Zend Modules]


I'm gonna write a test and send it to QA too.
 [2008-12-01 17:16 UTC] till@php.net
Just to add to this:

I know that the function is not supposed to be a JSON validator, but it's supposed to return the string as is -- in case it's a literal type, but why does it in some cases return "null" then?

For example:
$bad_json = "{ 'bar': 'baz' }";
json_decode($bad_json); // null

I know this is "probably" an edge-case but $bad_json could be my own /valid/ string -- not valid JSON. Because a string could look like anything. Point well taken, I'm passing in a pretty /funky/ looking string. But instead of "NULL", json_decode should return the string as-is.

That is, according to the documentation, a bug. ;-)

Lots of people also seemed to rely on json_decode as a json validator. Which is -- once you understand the subtle differences -- not the case.

The case should be made for either one though.
 [2008-12-02 18:52 UTC] steven at acko dot net
till said:
"but it's supposed to return the string as is -- in case it's a literal 
type, but why does it in some cases return "null" then?"

What argument is there for having (some) unparseable sequences returned 
as is? If json_decode() returns a string, then that should mean that the 
input was a valid JSON encoding of that string, no?

The only literal types JSON allows are numbers and the pre-defined 
constants 'true' 'false' and 'null'. Strings must be quote-delimited.

The fact that you can switch between 'return NULL' and 'return the 
argument as-is' just by adding/removing a leading space is a pretty big 
sign that something is wrong here. To be honest, it seems a bit silly 
that this is even an argument.
 [2008-12-03 21:10 UTC] magicaltux@php.net
Ok guys, I've had a look at the CVS history for json, and checked why it was following this weird behaviour (returning what was passed in some cases, and NULL in other cases).

The CVS commit log message for this relates to bug #38680, however it seems that the behaviour in parsing strings not handled by json is doing too much to try to "fix" things and find a way to provide parsed value.

Anyway here's a patch that changes this behaviour to make json_decode() return NULL when we get invalid JSON data, while still keeping null, true, false and integers parsing.

Some tests were fixed (the result depended on broken behaviour), and the other tests still run fine.


The patch itself, against PHP_5_2:

http://ookoo.org/svn/snip/php_5_2-json-returntype-final-fix.patch


If nobody can find anything against this (being a bit more strict with obviously wrong values) I'll add patchs against HEAD and PHP_5_3.
 [2008-12-03 21:17 UTC] magicaltux@php.net
Just a note for documentation:

http://docs.php.net/json_decode

Right now the documentation says the function returns an object, OR an array. This is not strictly true as it may return a string, a boolean, an integer, a double... depending on the input.

Also, the fact json_decode() may return NULL on error isn't explicitly documented either, instead some examples which happens to return NULL with the current implementation are provided. I think it would be a good idea to explicitly document this behavior, if the change I'm proposing here is accepted.
 [2008-12-03 22:29 UTC] magicaltux@php.net
And here are patches against PHP_5_3 and HEAD:

http://ookoo.org/svn/snip/php_5_3-json-returntype-final-fix.patch

http://ookoo.org/svn/snip/php_head-json-returntype-final-fix.patch

Some tests now work on json on HEAD (less failure than what's currently displayed on gcov.php.net) but still two fails. As those failures are not within the scope of this bug (and are specific to HEAD) they be fixed in different patches.

I believe that once this is commited to the CVS, this bug should be marked as "To be documented". I also believe till wants to submit some additional tests for those this issue...
 [2008-12-12 07:51 UTC] kevin at metalaxe dot com
The JSON spec states:
"
A JSON text is a sequence of tokens.  The set of tokens includes six
   structural characters, strings, numbers, and three literal names.

A JSON text is a serialized object or array.
"

So, in order to maintain compliance, PHP must also support non-objects/arrays as input properly.

If I understand your patch correctly:

If the input is json_decode("null"); the output would be NULL (I saw no test case for null input in the patch itself). We would have no way of knowing a problem exists if one were to have an input of json_decode('[');.

Can't this function throw an exception on failure? Failing that,could we at least get a PHP warning? Otherwise it will be impossible to full rely on this function in the case where null is the actual input...
 [2008-12-12 23:21 UTC] scottmac@php.net
Applied the patch provided by magicaltux
 [2008-12-23 11:24 UTC] bruno dot p dot reis at gmail dot com
I agree with kevin at metalaxe dot com, 

throwing an exception (may be even a InvalidArgumentException) on malformed json would be a much more decent way to say the json is invalid and would clarify a lot the behaviour. 

Other good thing would be another function just to check if a json is valid or not.
 [2009-07-27 03:44 UTC] scottmac@php.net
This is in 5.3 now, you can use json_last_error() to check if a syntax error occurred rather than reading the return value.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 03 17:01:29 2024 UTC