PHP :: Bug #52810 :: substr() and $string[n] corrupt multi-byte UTF-8 strings

Bug #52810	substr() and $string[n] corrupt multi-byte UTF-8 strings
Submitted:	2010-09-10 12:46 UTC	Modified:	2010-09-10 13:55 UTC
From:	trane at gol dot com	Assigned:
Status:	Not a bug	Package:	Strings related
PHP Version:	Irrelevant	OS:	OS X 10.6.4
Private report:	No	CVE-ID:	None

View Developer Edit

Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !

Your email address: MUST BE VALID
Solve the problem: 44 - 15 = ?
Subscribe to this entry?

[2010-09-10 12:46 UTC] trane at gol dot com

Description:
------------
(PHP 5.3.2 (cli) (built: Aug  7 2010 00:04:41) 
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies)

When trying to extract a single character from a UTF-8-encoded Japanese string, instead of the expected character, one gets the dreaded black-diamond-question-mark-of-death.



Test script:
---------------
$s_string = "静岡は蒸し暑いです。";
echo $s_string[3], "<p />";
// expected output is 蒸
// actual output is �
print_r($s_string[3]);
// expected output is 蒸
// actual output is �
echo "<p />";
$sub = substr($s_string, 3, 1);
echo $sub, "<p />";
// expected output is 蒸
// actual output is �

Expected result:
----------------
Expected output is 蒸



Actual result:
--------------
Actual output is �

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2010-09-10 13:55 UTC] cataphract@php.net

-Status: Open +Status: Bogus

[2010-09-10 13:55 UTC] cataphract@php.net

This is not a bug.

substr and $str[n] or $str{n} treat the string as a byte array. If you want to get the n-th Unicode code point, use mb_substr.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Tue Oct 28 18:00:01 2025 UTC