php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43957 utf8_decode() provides random output for bad input
Submitted: 2008-01-29 02:35 UTC Modified: 2008-01-29 23:21 UTC
From: penguin@php.net Assigned: rasmus (profile)
Status: Closed Package: Unknown/Other Function
PHP Version: 5.2.5 OS: linux debian 4.0
Private report: No CVE-ID: None
 [2008-01-29 02:35 UTC] penguin@php.net
Description:
------------
utf8_decode() outputs a random character when supplied with bad input.

When invalid sequences are added, utf8_decoded() usually replace the sequence with the character "?". But when a lonely highbit character is present in the end the output seem to be a random character.


Reproduce code:
---------------
for($a=0;$a<20;$a++)printf("%02x ",utf8_decode(chr(0xE0)));


Expected result:
----------------
3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f

(utf8_decode() returns a question mark)

Actual result:
--------------
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
or
09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09
or
05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
or
02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
or some other random value

It seem to differ more with individual runs:

$ for a in `seq 1 20`; do php -r 'printf("%02x ",utf8_decode(chr(0xE0)));';
done
08 00 00 02 00 00 00 00 00 05 00 00 00 05 00 00 07 00 09 00

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-01-29 21:59 UTC] rasmus@php.net
I see the bug in the code.  I still think this function needs to die, but I guess we have to continue supporting it.  I'll fix it.
 [2008-01-29 23:21 UTC] rasmus@php.net
Fixed in CVS
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 08:01:28 2024 UTC