php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #52810 substr() and $string[n] corrupt multi-byte UTF-8 strings
Submitted: 2010-09-10 12:46 UTC Modified: 2010-09-10 13:55 UTC
From: trane at gol dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: Irrelevant OS: OS X 10.6.4
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: trane at gol dot com
New email:
PHP Version: OS:

 

 [2010-09-10 12:46 UTC] trane at gol dot com
Description:
------------
(PHP 5.3.2 (cli) (built: Aug  7 2010 00:04:41) 
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies)

When trying to extract a single character from a UTF-8-encoded Japanese string, instead of the expected character, one gets the dreaded black-diamond-question-mark-of-death.



Test script:
---------------
$s_string = "静岡は蒸し暑いです。";
echo $s_string[3], "<p />";
// expected output is 蒸
// actual output is �
print_r($s_string[3]);
// expected output is 蒸
// actual output is �
echo "<p />";
$sub = substr($s_string, 3, 1);
echo $sub, "<p />";
// expected output is 蒸
// actual output is �

Expected result:
----------------
Expected output is 蒸



Actual result:
--------------
Actual output is �


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-09-10 13:55 UTC] cataphract@php.net
-Status: Open +Status: Bogus
 [2010-09-10 13:55 UTC] cataphract@php.net
This is not a bug.

substr and $str[n] or $str{n} treat the string as a byte array. If you want to get the n-th Unicode code point, use mb_substr.
 
PHP Copyright © 2001-2025 The PHP Group
All rights reserved.
Last updated: Fri May 09 12:01:28 2025 UTC