php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43941 json_encode silently cuts non-UTF8 strings
Submitted: 2008-01-25 22:19 UTC Modified: 2008-01-30 16:12 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: stas at zend dot com Assigned:
Status: Closed Package: JSON related
PHP Version: 5.3CVS-2008-01-25 (CVS) OS: *
Private report: No CVE-ID: None
 [2008-01-25 22:19 UTC] stas at zend dot com
Description:
------------
Right now, if json_encode sees wrong UTF-8 data, it just cuts the string in the middle, no error returned, no message produced. 

I think it's not a good idea to just silently cut the data. In fact, I think it is a bug caused by this code in ext/json/utf8_to_utf16.c:
        if (c < 0) {
            return UTF8_END ? the_index : UTF8_ERROR;
        }
which inherited this bug from code published on json.org. It should be:
        if (c < 0) {
            return (c == UTF8_END) ? the_index : UTF8_ERROR;
        }

Reproduce code:
---------------
var_dump(json_encode("ab\xE0"));
var_dump(json_encode("ab\xE0\""));

Expected result:
----------------
Some error message

Actual result:
--------------
Just:
""ab""
""ab""

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-01-30 03:22 UTC] stas@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 [2008-01-30 16:12 UTC] rasmus@php.net
I sent this on to json.org as well, and it is now fixed there.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Mar 19 08:01:29 2024 UTC