php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #68303 str_split returns garbage when a certain UTF-8 character is passed
Submitted: 2014-10-24 20:07 UTC Modified: 2014-10-24 20:18 UTC
From: adrya dot stembridge at gmail dot com Assigned:
Status: Not a bug Package: Output Control
PHP Version: 5.6.2 OS: CentOS 6.5
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: adrya dot stembridge at gmail dot com
New email:
PHP Version: OS:

 

 [2014-10-24 20:07 UTC] adrya dot stembridge at gmail dot com
Description:
------------
I am attempting to break up a string of initials in order to add periods after each letter. Ex, AMA becomes A.M.A.   I'm using str_split to generate an array, which I then glue together with periods. 

The problem is when I have a certain UTF8 character in the string: "ŞZ". str_split appears to not like the UTF8 encoding of the first letter. I've added some debug output in the code snippet below.  

Test script:
---------------
	$str_initials = "ŞZ";

	echo "<p><b>\$str_initials</b> $str_initials<p>";
	
	$arr_initials 	= str_split($str_initials);
	print_r($arr_initials);
	echo "<p>";
	foreach ($arr_initials as $initial)
	{
		echo "<b>\$initial</b> $initial<br>";
		$arr_ni[] = $initial . '.';
	}
	$str_initials = implode($arr_ni);

	echo "<p><b>\$str_initials</b> $str_initials";

Expected result:
----------------
ŞZ becomes Ş.Z.

Actual result:
--------------
ŞZ becomes �.�.Z.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-10-24 20:18 UTC] aharvey@php.net
-Status: Open +Status: Not a bug
 [2014-10-24 20:18 UTC] aharvey@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

str_split() isn't multibyte aware; http://php.net/str_split#107658 describes a method of using preg_split() to implement a function similar to str_split() that is.
 [2014-10-24 20:34 UTC] php at requinix dot net
I don't normally critique code here but there's a much simpler way to do all that:
  $str_initials = preg_replace('/./u', '$0.', $str_initials);
(UTF-8 only)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Dec 27 05:01:27 2024 UTC