php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #70363 improve performance of case-insensitive mb_*() operations
Submitted: 2015-08-26 16:07 UTC Modified: 2017-07-28 10:51 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: jhdxr@php.net Assigned:
Status: Open Package: mbstring related
PHP Version: 7.0.0RC1 OS: irrelevent
Private report: No CVE-ID: None
 [2015-08-26 16:07 UTC] jhdxr@php.net
Description:
------------
It seems mb_* is much slower than the original string related method, there are lots of data in the user contribution section of the manual (<http://php.net/mb_strlen>, <http://php.net/manual/en/function.mb-strtolower.php>). and today I read a post talk about this topic (<https://v2ex.com/t/216212>), and the author use an regular expression instead of mb_*, which is, and to my surprise, much faster. Here is his code: <https://3v4l.org/d8IAo>. So i'm wondering why mb_* is so slow. Are the users use them in a wrong manner, or our implements has some problem?


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-08-27 11:57 UTC] yohgaki@php.net
Interesting result.
It seems https://3v4l.org/d8IAo is a worst case and mbstring's case insensitive code has performance issue in general. 

Tried case sensitive https://3v4l.org/ZF3TK version. The result varies, but it seems mb_* functions have improvement margin. I get slower results consistently with current master on my real PC. (I suppose 3v4l is VM or container) 

If I use 
 $raw = '0123456789abcdefghijklmnopqrstuvwxyzあいうえおかきくけこさしすせそなにぬねのはひふへほまみむめもやいゆえよわをん';
for https://3v4l.org/ZF3TK, case sensitive version mb_strstr() is a little faster. BTW, pcre became a lot faster than 5.6. This is the reason why mb_strstr() is a little faster than pcre version.

If anyone could make more generic benchmark script, i.e. benchmark mb_*() functions, it would be helpful.
 [2015-08-28 02:05 UTC] cmb@php.net
-Summary: why mb_* so slow? +Summary: improve performance of case-insensitive mb_*() operations
 [2015-08-28 02:05 UTC] cmb@php.net
Apparently, this is not a general mb_*() issue, but rather related
to case-insensitive operations, as Yasuo has pointed out. It seems
that this is been caused due to the lack of native mbfl_*() case
insensitive operations, so PHP converts $needle and $haystack to
upper-case intermittently, what has some overhead.

Anyhow, this is not a bug, but rather a feature request to improve
the performance of case-insensitive mb_*() operations.
 [2015-08-28 02:08 UTC] cmb@php.net
-Type: Bug +Type: Feature/Change Request
 [2017-07-28 10:51 UTC] nikic@php.net
The good news is that the first test case (using mb_stristr) is now more than 6 times faster on master. The bad news is that this is still 3.5 times slower than the PCRE version.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Sep 11 10:01:27 2024 UTC