php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #68303 str_split returns garbage when a certain UTF-8 character is passed
Submitted: 2014-10-24 20:07 UTC Modified: 2014-10-24 20:18 UTC
From: adrya dot stembridge at gmail dot com Assigned:
Status: Not a bug Package: Output Control
PHP Version: 5.6.2 OS: CentOS 6.5
Private report: No CVE-ID: None
 [2014-10-24 20:07 UTC] adrya dot stembridge at gmail dot com
Description:
------------
I am attempting to break up a string of initials in order to add periods after each letter. Ex, AMA becomes A.M.A.   I'm using str_split to generate an array, which I then glue together with periods. 

The problem is when I have a certain UTF8 character in the string: "ŞZ". str_split appears to not like the UTF8 encoding of the first letter. I've added some debug output in the code snippet below.  

Test script:
---------------
	$str_initials = "ŞZ";

	echo "<p><b>\$str_initials</b> $str_initials<p>";
	
	$arr_initials 	= str_split($str_initials);
	print_r($arr_initials);
	echo "<p>";
	foreach ($arr_initials as $initial)
	{
		echo "<b>\$initial</b> $initial<br>";
		$arr_ni[] = $initial . '.';
	}
	$str_initials = implode($arr_ni);

	echo "<p><b>\$str_initials</b> $str_initials";

Expected result:
----------------
ŞZ becomes Ş.Z.

Actual result:
--------------
ŞZ becomes �.�.Z.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-10-24 20:18 UTC] aharvey@php.net
-Status: Open +Status: Not a bug
 [2014-10-24 20:18 UTC] aharvey@php.net
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

str_split() isn't multibyte aware; http://php.net/str_split#107658 describes a method of using preg_split() to implement a function similar to str_split() that is.
 [2014-10-24 20:34 UTC] php at requinix dot net
I don't normally critique code here but there's a much simpler way to do all that:
  $str_initials = preg_replace('/./u', '$0.', $str_initials);
(UTF-8 only)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Dec 26 14:01:30 2024 UTC