php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #52810 substr() and $string[n] corrupt multi-byte UTF-8 strings
Submitted: 2010-09-10 12:46 UTC Modified: 2010-09-10 13:55 UTC
From: trane at gol dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: Irrelevant OS: OS X 10.6.4
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
13 - 9 = ?
Subscribe to this entry?

 
 [2010-09-10 12:46 UTC] trane at gol dot com
Description:
------------
(PHP 5.3.2 (cli) (built: Aug  7 2010 00:04:41) 
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies)

When trying to extract a single character from a UTF-8-encoded Japanese string, instead of the expected character, one gets the dreaded black-diamond-question-mark-of-death.



Test script:
---------------
$s_string = "静岡は蒸し暑いです。";
echo $s_string[3], "<p />";
// expected output is 蒸
// actual output is �
print_r($s_string[3]);
// expected output is 蒸
// actual output is �
echo "<p />";
$sub = substr($s_string, 3, 1);
echo $sub, "<p />";
// expected output is 蒸
// actual output is �

Expected result:
----------------
Expected output is 蒸



Actual result:
--------------
Actual output is �


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-09-10 13:55 UTC] cataphract@php.net
-Status: Open +Status: Bogus
 [2010-09-10 13:55 UTC] cataphract@php.net
This is not a bug.

substr and $str[n] or $str{n} treat the string as a byte array. If you want to get the n-th Unicode code point, use mb_substr.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Wed Apr 08 02:01:24 2020 UTC