php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #63898 json_encode sets string to null for invalid characters
Submitted: 2013-01-04 01:04 UTC Modified: 2017-08-20 17:21 UTC
Votes:5
Avg. Score:4.0 ± 0.9
Reproduced:4 of 4 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: sreed at ontraport dot com Assigned: bukka (profile)
Status: Closed Package: JSON related
PHP Version: 5.4.10 OS: All
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: sreed at ontraport dot com
New email:
PHP Version: OS:

 

 [2013-01-04 01:04 UTC] sreed at ontraport dot com
Description:
------------
When you use json_encode with an invalid UTF-8 byte sequence in a string PHP will 
generate a warning (with display_errors set to off) and the function returns an 
invalid json encoded string. The string with the invalid UTF-8 byte sequence is 
replaced with null (for example: {null:""}). This is invalid json and can not be 
decoded with json_decode.

I would think the expected behavior should be that json_encode should never 
returns an invalid json encoded string. It should either return false on failure 
as the documentation states or the invalid UTF-8 byte sequence should be handled 
in a way that does not corrupt the json string.

Test script:
---------------
$key = "Foo " . chr(163);

$array = array($key => "");

var_dump($array);

$json = json_encode($array);

echo $json."\n";

var_dump(json_decode($json));

Expected result:
----------------
I would expect the returned json string to be valid or for json_encode to return 
false. 

Actual result:
--------------
array(1) {
  ["Foo �"]=>
  string(0) ""
}
{null:""}
NULL


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-01-04 01:06 UTC] sreed at ontraport dot com
-: sreed at sendpepper dot com +: sreed at ontraport dot com
 [2013-01-04 01:06 UTC] sreed at ontraport dot com
.
 [2013-01-06 11:35 UTC] Sjon at hortensius dot net
This actually worked fine in 5.3.14 but was broken in 5.3.14:
 
http://3v4l.org/Eouni#v5314

5.2.0 - 5.2.6 would truncate the character without notice but wouldn't produce 
invalid json either
 [2013-03-30 17:08 UTC] programming at stefan-koch dot name
I was able to locate the bug, but I am too unknown in the PHP source to know how to fix it best.

For keys, just like for values, "json_escape_string" is being used. In PHP 5.4 (unlike PHP 5.2) there's a check for invalid UTF-8 sequences. In PHP 5.2.0 this special check did not exist, instead when something was either wrong or empty, an empty string was printed.

So the location of the problem is line 432 in ext/json/json.c (PHP 5.4.12) or around line 442 in git master (commit ac9f53dd9c0b184bab14d669c72971c0405ed488).

My idea would be - if one wants to maintain the 'null' printing - to pass an additional argument to "json_escape_string" to tell whether this is a key or a value (since they seem to need different treatment, as null is not allowed for keys in JSON).
Alternative would be to insert empty string in case of invalid UTF8 sequence. This would be a very easy fix going back to the old state. However, I guess somebody introduced null for some reason.
Or you could return false if some error occured, but from my Python knowledge I really dislike this treatment. It's correct, but it leads to non-working code due to encoding problems very often, at least when you receive data from somewhere else).
 [2013-03-30 18:53 UTC] programming at stefan-koch dot name
Fixed in the current git master (see rev in my comment above). So it will be fixed in PHP 5.5
Just compiled from git and it returns 'false' when there are illegal characters.

Will return false in all cases when there is an error (check in implementation of json_encode).
 [2014-01-27 23:50 UTC] gbarros at yahoo-inc dot com
I tried the code in git/svn HEAD right now (last commit 
aafce7353e "merge branch 5.6") so I assume whichever code discussed here is included (no patch on this report).

I added a test with a string with a tab and it does not return false. I think it should remove inescapable chars (and maybe issue a log warning) and for chars with a escape sequence, it should just encode it.


--TEST--
json_decode() tests
--SKIPIF--
<?php if (!extension_loaded("json")) print "skip"; ?>
--FILE--
<?php
var_dump(json_encode('a'.chr(9).'b')); // char with usable escape sequence (tab)
var_dump(json_encode('a'.chr(1).'b')); // char with no usable escape sequecne
?>                                                                                                                                                    --EXPECTREGEX--
string\(4\) \"a\tb\"
string\(2\) \"ab\"
 [2014-01-28 18:34 UTC] gbarros at yahoo-inc dot com
I think this is now a duplicate of https://bugs.php.net/bug.php?id=65082 ?
 [2016-08-08 12:35 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2016-08-08 12:35 UTC] cmb@php.net
For reference: <https://3v4l.org/gMA2I>.

The behavior as of PHP 7.0.0 is okay (it would have to be
documented that also NULL can be returned on failure), but the
behavior of PHP 5.6 seems to be erroneous.
 [2017-08-20 17:21 UTC] bukka@php.net
-Status: Verified +Status: Closed -Assigned To: +Assigned To: bukka
 [2017-08-20 17:21 UTC] bukka@php.net
This is no longer an issue in PHP 7.x (the only release receiving bug fixes like this). In addition PHP 7.2 introduces a new constants for replacing or ignoring invalid UTF-8 characters.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Apr 23 20:01:29 2024 UTC