php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #67442 mbstring equivalent of `explode`
Submitted: 2014-06-14 10:00 UTC Modified: -
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: phpmpan at mpan dot pl Assigned:
Status: Open Package: mbstring related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2014-06-14 10:00 UTC] phpmpan at mpan dot pl
Description:
------------
mbstring is missing `explode` equivalent.

• `explode` itself can't be used for the purpose, because it fails for encodings that either have no autosynchronization property (like utf16) or identical byte sequences are otherwise not uniquely mapped to characters.
• `mb_split` requires regex as its argument - hence it is unsuitable if the delimiter is not known during writing the code.
• `mb_strpos` + `mb_substr` tandem is very slow.

Test script:
---------------
function printHexString($string) {
    foreach (str_split($string) as $character) {
        echo dechex(ord($character)), ' ';
    }
    echo "\n";
}

mb_internal_encoding('utf-8'); // should match the actual encoding
$input = mb_convert_encoding('Ω☡Ω☡Ω☡☦abc', 'utf-16'); // Ω is U+2126, ☡ is U+2621
$delimiter = mb_convert_encoding('☦', 'utf-16'); // ☦ is U+2626
printHexString($input);
echo "\n";

$exploded = explode($delimiter, $input);
foreach ($exploded as $element) {
    printHexString($element);
}

Expected result:
----------------
(assuming `mb_explode` instead `explode`)
21 26 26 21 21 26 26 21 21 26 26 21 26 26 0 61 0 62 0 63 

21 26 26 21 21 26 26 21 21 26 26 21
0 61 0 62 0 63 

Actual result:
--------------
21 26 26 21 21 26 26 21 21 26 26 21 26 26 0 61 0 62 0 63 

21 
21 21 
21 21 
21 
0 61 0 62 0 63

Patches

Add a Patch

Pull Requests

Add a Pull Request

 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Mon May 25 18:01:26 2020 UTC