|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #52810 substr() and $string[n] corrupt multi-byte UTF-8 strings
Submitted: 2010-09-10 12:46 UTC Modified: 2010-09-10 13:55 UTC
From: trane at gol dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: Irrelevant OS: OS X 10.6.4
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Bug Type:
From: trane at gol dot com
New email:
PHP Version: OS:


 [2010-09-10 12:46 UTC] trane at gol dot com
(PHP 5.3.2 (cli) (built: Aug  7 2010 00:04:41) 
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies)

When trying to extract a single character from a UTF-8-encoded Japanese string, instead of the expected character, one gets the dreaded black-diamond-question-mark-of-death.

Test script:
$s_string = "静岡は蒸し暑いです。";
echo $s_string[3], "<p />";
// expected output is 蒸
// actual output is �
// expected output is 蒸
// actual output is �
echo "<p />";
$sub = substr($s_string, 3, 1);
echo $sub, "<p />";
// expected output is 蒸
// actual output is �

Expected result:
Expected output is 蒸

Actual result:
Actual output is �


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2010-09-10 13:55 UTC]
-Status: Open +Status: Bogus
 [2010-09-10 13:55 UTC]
This is not a bug.

substr and $str[n] or $str{n} treat the string as a byte array. If you want to get the n-th Unicode code point, use mb_substr.
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon Mar 30 20:01:24 2020 UTC