php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #52810 substr() and $string[n] corrupt multi-byte UTF-8 strings
Submitted: 2010-09-10 12:46 UTC Modified: 2010-09-10 13:55 UTC
From: trane at gol dot com Assigned:
Status: Not a bug Package: Strings related
PHP Version: Irrelevant OS: OS X 10.6.4
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: trane at gol dot com
New email:
PHP Version: OS:

 

 [2010-09-10 12:46 UTC] trane at gol dot com
Description:
------------
(PHP 5.3.2 (cli) (built: Aug  7 2010 00:04:41) 
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies)

When trying to extract a single character from a UTF-8-encoded Japanese string, instead of the expected character, one gets the dreaded black-diamond-question-mark-of-death.



Test script:
---------------
$s_string = "静岡は蒸し暑いです。";
echo $s_string[3], "<p />";
// expected output is 蒸
// actual output is �
print_r($s_string[3]);
// expected output is 蒸
// actual output is �
echo "<p />";
$sub = substr($s_string, 3, 1);
echo $sub, "<p />";
// expected output is 蒸
// actual output is �

Expected result:
----------------
Expected output is 蒸



Actual result:
--------------
Actual output is �


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-09-10 13:55 UTC] cataphract@php.net
-Status: Open +Status: Bogus
 [2010-09-10 13:55 UTC] cataphract@php.net
This is not a bug.

substr and $str[n] or $str{n} treat the string as a byte array. If you want to get the n-th Unicode code point, use mb_substr.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 12:01:27 2024 UTC