|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #70363 improve performance of case-insensitive mb_*() operations
Submitted: 2015-08-26 16:07 UTC Modified: 2017-07-28 10:51 UTC
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: Assigned:
Status: Open Package: mbstring related
PHP Version: 7.0.0RC1 OS: irrelevent
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Bug Type:
New email:
PHP Version: OS:


 [2015-08-26 16:07 UTC]
It seems mb_* is much slower than the original string related method, there are lots of data in the user contribution section of the manual (<>, <>). and today I read a post talk about this topic (<>), and the author use an regular expression instead of mb_*, which is, and to my surprise, much faster. Here is his code: <>. So i'm wondering why mb_* is so slow. Are the users use them in a wrong manner, or our implements has some problem?


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2015-08-27 11:57 UTC]
Interesting result.
It seems is a worst case and mbstring's case insensitive code has performance issue in general. 

Tried case sensitive version. The result varies, but it seems mb_* functions have improvement margin. I get slower results consistently with current master on my real PC. (I suppose 3v4l is VM or container) 

If I use 
 $raw = '0123456789abcdefghijklmnopqrstuvwxyzあいうえおかきくけこさしすせそなにぬねのはひふへほまみむめもやいゆえよわをん';
for, case sensitive version mb_strstr() is a little faster. BTW, pcre became a lot faster than 5.6. This is the reason why mb_strstr() is a little faster than pcre version.

If anyone could make more generic benchmark script, i.e. benchmark mb_*() functions, it would be helpful.
 [2015-08-28 02:05 UTC]
-Summary: why mb_* so slow? +Summary: improve performance of case-insensitive mb_*() operations
 [2015-08-28 02:05 UTC]
Apparently, this is not a general mb_*() issue, but rather related
to case-insensitive operations, as Yasuo has pointed out. It seems
that this is been caused due to the lack of native mbfl_*() case
insensitive operations, so PHP converts $needle and $haystack to
upper-case intermittently, what has some overhead.

Anyhow, this is not a bug, but rather a feature request to improve
the performance of case-insensitive mb_*() operations.
 [2015-08-28 02:08 UTC]
-Type: Bug +Type: Feature/Change Request
 [2017-07-28 10:51 UTC]
The good news is that the first test case (using mb_stristr) is now more than 6 times faster on master. The bad news is that this is still 3.5 times slower than the PCRE version.
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Mar 02 14:01:34 2024 UTC