PHP :: Bug #52810 :: substr() and $string[n] corrupt multi-byte UTF-8 strings

Bug #52810	substr() and $string[n] corrupt multi-byte UTF-8 strings
Submitted:	2010-09-10 12:46 UTC	Modified:	2010-09-10 13:55 UTC
From:	trane at gol dot com	Assigned:
Status:	Not a bug	Package:	Strings related
PHP Version:	Irrelevant	OS:	OS X 10.6.4
Private report:	No	CVE-ID:	None

View Developer Edit

Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.

Password:

Status:
Package:
Bug Type:
Summary:
From:	trane at gol dot com
New email:
PHP Version:		OS:

New Comment:

[2010-09-10 12:46 UTC] trane at gol dot com

Description:
------------
(PHP 5.3.2 (cli) (built: Aug  7 2010 00:04:41) 
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies)

When trying to extract a single character from a UTF-8-encoded Japanese string, instead of the expected character, one gets the dreaded black-diamond-question-mark-of-death.



Test script:
---------------
$s_string = "静岡は蒸し暑いです。";
echo $s_string[3], "<p />";
// expected output is 蒸
// actual output is �
print_r($s_string[3]);
// expected output is 蒸
// actual output is �
echo "<p />";
$sub = substr($s_string, 3, 1);
echo $sub, "<p />";
// expected output is 蒸
// actual output is �

Expected result:
----------------
Expected output is 蒸



Actual result:
--------------
Actual output is �

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2010-09-10 13:55 UTC] cataphract@php.net

-Status: Open +Status: Bogus

[2010-09-10 13:55 UTC] cataphract@php.net

This is not a bug.

substr and $str[n] or $str{n} treat the string as a byte array. If you want to get the n-th Unicode code point, use mb_substr.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Fri Jul 18 23:00:02 2025 UTC