|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #19690 mb_split is broken
Submitted: 2002-10-01 08:44 UTC Modified: 2002-10-07 13:42 UTC
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: jc at mega-bucks dot co dot jp Assigned:
Status: Closed Package: mbstring related
PHP Version: 4.2.3 OS: Red Hat Linux 7.2
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
Solve the problem:
22 - 3 = ?
Subscribe to this entry?

 [2002-10-01 08:44 UTC] jc at mega-bucks dot co dot jp
The following output and code show that mb_split does not work:


REGEX encoding is EUC-JP
encoding is ASCII
v is One two
a word: **
a word: **
a word: **
a word: **
a word: **
a word: **
a word: **
a word: **
a word: **


$aWords = array();
echo " REGEX encoding is ". mb_regex_encoding()."<BR>";
$v ="One two";
echo "encoding is ".mb_detect_encoding($v)."<BR>";
echo "v is $v <BR>";
$aWords = mb_split(" ",$v);
echo "COUNT: ".count($aWords)."<BR>";
foreach($aWords as $w) {
  echo "a word: *$w*<BR>";


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2002-10-01 08:52 UTC] jc at mega-bucks dot co dot jp
Here are my PHP settings in case you need to see them

Multibyte (Japanese) Support enabled
multibyte regex support enabled

Directive                     Local Value Master Value
mbstring.detect_order         auto        auto
mbstring.func_overload        0           0
mbstring.http_input           auto        auto
mbstring.http_output          no value    no value
mbstring.internal_encoding    EUC-JP      EUC-JP
mbstring.substitute_character no value    no value
 [2002-10-01 13:09 UTC]
confirmed with HEAD.
 [2002-10-02 06:42 UTC]
As of current implementation, mb_split() and mb_ereg() take the regex pattern as extended mode one, in which white spaces, carridge returns, and line feeds are ignored and any sequences beginning with "#" and delimitted by "\n" are treated as comments.
So if you would like to use these characters in the pattern, you should escape them with a backslash '\'.

IMO this implied behaviour is quite confusing, as we are more familiar with split() and preg_split().

 [2002-10-02 06:44 UTC]
oops i forgot to change the status.
 [2002-10-03 07:26 UTC] jc at mega-bucks dot co dot jp
I agree. Can this be added to the documentation then?
 [2002-10-07 13:42 UTC]
After some discussion, extended mode is no more the default for mb_split() and mb_ereg_xxxx(). Instead you can change the default behaivour by this newly introduced function:

proto mb_regex_set_options(string options)

(Although this is not documented yet... I hope I will soon update them.)

Anyway, thank you for the report.

PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Sun Dec 04 02:05:53 2022 UTC