php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #19690 mb_split is broken
Submitted: 2002-10-01 08:44 UTC Modified: 2002-10-07 13:42 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: jc at mega-bucks dot co dot jp Assigned:
Status: Closed Package: mbstring related
PHP Version: 4.2.3 OS: Red Hat Linux 7.2
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: jc at mega-bucks dot co dot jp
New email:
PHP Version: OS:

 

 [2002-10-01 08:44 UTC] jc at mega-bucks dot co dot jp
The following output and code show that mb_split does not work:

OUTPUT:

REGEX encoding is EUC-JP
encoding is ASCII
v is One two
COUNT: 9
a word: **
a word: **
a word: **
a word: **
a word: **
a word: **
a word: **
a word: **
a word: **

CODE:

$aWords = array();
echo " REGEX encoding is ". mb_regex_encoding()."<BR>";
$v ="One two";
echo "encoding is ".mb_detect_encoding($v)."<BR>";
echo "v is $v <BR>";
$aWords = mb_split(" ",$v);
echo "COUNT: ".count($aWords)."<BR>";
foreach($aWords as $w) {
  echo "a word: *$w*<BR>";
}
exit;

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2002-10-01 08:52 UTC] jc at mega-bucks dot co dot jp
Here are my PHP settings in case you need to see them

Multibyte (Japanese) Support enabled
multibyte regex support enabled

Directive                     Local Value Master Value
mbstring.detect_order         auto        auto
mbstring.func_overload        0           0
mbstring.http_input           auto        auto
mbstring.http_output          no value    no value
mbstring.internal_encoding    EUC-JP      EUC-JP
mbstring.substitute_character no value    no value
 [2002-10-01 13:09 UTC] moriyoshi@php.net
confirmed with HEAD.
 [2002-10-02 06:42 UTC] moriyoshi@php.net
As of current implementation, mb_split() and mb_ereg() take the regex pattern as extended mode one, in which white spaces, carridge returns, and line feeds are ignored and any sequences beginning with "#" and delimitted by "\n" are treated as comments.
So if you would like to use these characters in the pattern, you should escape them with a backslash '\'.

IMO this implied behaviour is quite confusing, as we are more familiar with split() and preg_split().

 [2002-10-02 06:44 UTC] moriyoshi@php.net
oops i forgot to change the status.
 [2002-10-03 07:26 UTC] jc at mega-bucks dot co dot jp
I agree. Can this be added to the documentation then?
 [2002-10-07 13:42 UTC] moriyoshi@php.net
After some discussion, extended mode is no more the default for mb_split() and mb_ereg_xxxx(). Instead you can change the default behaivour by this newly introduced function:

proto mb_regex_set_options(string options)

(Although this is not documented yet... I hope I will soon update them.)

Anyway, thank you for the report.

 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Apr 26 14:01:29 2024 UTC