go to bug id or search bugs for
The multibyte split function (function.mb_split) does not appear to be working as expected.
Instead of parsing and converting the multibyte string into array elements as specified, the character elements are being split at unpredictable positions, yielding unpredictable array elements.
The "Test script" demonstrates an expected result for the first multibyte string, but then shows unexpected results for the next two multibyte strings.
Thank you for reviewing this potential bug.
header('Content-Type: text/html; charset=UTF-8');
$arr = mb_split('\B', "你好"); # Array (  => 你  => 好 ) ## Okay!
$arr = mb_split('\B', "你你"); # Array (  => 你  => 你 ) ## Expected Result
print_r($arr); ## Instead, this message appears:
## Warning: mb_split(): mbregex search failure in mbsplit():
## no support in this configuration in /dir/foo.php on line 22
$arr = mb_split('\B', '隨著劇情的推進');
# Expected Result
# Array (  => 隨  => 著  => 劇  => 情  => 的  => 推  => 進 )
# Actual Result, NOT expected
# Array (  => 隨  => �  => �劇  => �  => �的  => �  => �進 )
Add a Patch
Add a Pull Request
Added Mac OS X 10.10 Yosemite to 'OS' for clarity.
It seems '\B' behaves differently than PCRE.
> It seems '\B' behaves differently than PCRE.
Indeed, and I'd call that difference a bug, see
Basically, the problem is that neither mb_split() nor
mb_ereg_replace() properly cater to matches of length 0, so \B is
matched at the very start of the subject and also 1 position
Related To: Bug #69256