php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #74935 mb_str*pos functions do not take into consideration mb_internal_encoding
Submitted: 2017-07-17 08:50 UTC Modified: 2017-07-23 11:00 UTC
From: andxfiles at gmail dot com Assigned: nikic (profile)
Status: Closed Package: mbstring related
PHP Version: Irrelevant OS: All
Private report: No CVE-ID: None
 [2017-07-17 08:50 UTC] andxfiles at gmail dot com
Description:
------------
when calling mb_str*pos (stripos, strrpos etc.) functions using mb_internal_encoding before their call instead of passing the encoding as a parameter, those functions do not use the encoding provided, thus being way slower.
As one can see in the actual results below, when using mb_internal_encoding, all functions become way faster except for mb_stripos.


Test script:
---------------
<?php
$starttime = microtime(true);
    for ($i=0; $i<100000; $i++) {
        $a = mb_strlen("fdsfdssdfoifjosdifjosdifjosdij:ά", "UTF-8");
    }
$finishtime = microtime(true);
echo "mb_strlen with parameter UTF8: " . number_format($finishtime - $starttime, 4)*1000 ." milliseconds<br/>";

$starttime = microtime(true);
    for ($i=0; $i<100000; $i++) {
        $a = mb_stripos("fdsfdssdfoifjosdifjosdifjosdij:ά", "α", 0, "UTF-8");
    }
$finishtime = microtime(true);
echo "mb_stripos with parameter UTF8: " . number_format($finishtime - $starttime, 4)*1000 ." milliseconds<br/>";


$starttime = microtime(true);
    for ($i=0; $i<100000; $i++) {
        $a = mb_substr("fdsfdssdfoifjosdifjosdifjosdij:ά", $i, 1, "UTF-8");
    }
$finishtime = microtime(true);
echo "mb_substr with parameter UTF8: " . number_format($finishtime - $starttime, 4)*1000 ." milliseconds<br/>";

mb_internal_encoding("UTF-8");

$starttime = microtime(true);
    for ($i=0; $i<100000; $i++) {
        $a = mb_strlen("fdsfdssdfoifjosdifjosdifjosdij:ά");
    }
$finishtime = microtime(true);
echo "mb_strlen with mb_internal_encoding UTF8: " . number_format($finishtime - $starttime, 4)*1000 ." milliseconds<br/>";

$starttime = microtime(true);
    for ($i=0; $i<100000; $i++) {
        $a = mb_stripos("fdsfdssdfoifjosdifjosdifjosdij:ά", "α", 0);
    }
$finishtime = microtime(true);
echo "mb_stripos with mb_internal_encoding UTF8: " . number_format($finishtime - $starttime, 4)*1000 ." milliseconds<br/>";

$starttime = microtime(true);
    for ($i=0; $i<100000; $i++) {
        $a = mb_substr("fdsfdssdfoifjosdifjosdifjosdij:ά", $i, 1);
    }
$finishtime = microtime(true);
echo "mb_substr with mb_internal_encoding UTF8: " . number_format($finishtime - $starttime, 4)*1000 ." milliseconds<br/>";
?>

Actual result:
--------------
mb_strlen with parameter UTF8: 103.7 milliseconds
mb_stripos with parameter UTF8: 1450.7 milliseconds
mb_substr with parameter UTF8: 81.6 milliseconds
mb_strlen with mb_internal_encoding UTF8: 39.2 milliseconds
mb_stripos with mb_internal_encoding UTF8: 1435.3 milliseconds
mb_substr with mb_internal_encoding UTF8: 48.5 milliseconds

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-07-23 08:44 UTC] nikic@php.net
-Assigned To: +Assigned To: nikic
 [2017-07-23 11:00 UTC] nikic@php.net
-Status: Assigned +Status: Closed
 [2017-07-23 11:00 UTC] nikic@php.net
A number of performance improvements for mb_stripos have been implemented.

Output of your test code on PHP 7.1:

mb_strlen with parameter UTF8: 38.2 milliseconds
mb_stripos with parameter UTF8: 705.8 milliseconds
mb_substr with parameter UTF8: 42.3 milliseconds
mb_strlen with mb_internal_encoding UTF8: 15.1 milliseconds
mb_stripos with mb_internal_encoding UTF8: 663.6 milliseconds
mb_substr with mb_internal_encoding UTF8: 20.9 milliseconds

Output on master:

mb_strlen with parameter UTF8: 12.4 milliseconds
mb_stripos with parameter UTF8: 83.4 milliseconds
mb_substr with parameter UTF8: 19.5 milliseconds
mb_strlen with mb_internal_encoding UTF8: 11.7 milliseconds
mb_stripos with mb_internal_encoding UTF8: 82.4 milliseconds
mb_substr with mb_internal_encoding UTF8: 17.8 milliseconds

For your particular example this gives an approximately 8x improvement.

Whether these changes will be in PHP 7.2 is yet to be seen.
 [2017-07-24 06:51 UTC] andxfiles at gmail dot com
Indeed this report is the same as https://bugs.php.net/bug.php?id=74933 so you can definitely close one of the two.

Here are the results for the script with the new patches of master which unfortunately do not show any change. mb_stripos with mb_internal_encoding is still as much slow as before.

The OS is: Windows Server 2012 Standard 64bit, php is x86(32bit). But the same results come out on Windows 7 64bit with php x86(32bit). @nicic did you try it under Linux or Windows ?

7.1.1
mb_strlen with parameter UTF8: 290.4 milliseconds
mb_stripos with parameter UTF8: 3250.2 milliseconds
mb_substr with parameter UTF8: 295.9 milliseconds
mb_strlen with mb_internal_encoding UTF8: 15.1 milliseconds
mb_stripos with mb_internal_encoding UTF8: 3292.2 milliseconds
mb_substr with mb_internal_encoding UTF8: 21.9 milliseconds

7.1.9-dev (master)
mb_strlen with parameter UTF8: 290.2 milliseconds
mb_stripos with parameter UTF8: 3245.3 milliseconds
mb_substr with parameter UTF8: 300.3 milliseconds
mb_strlen with mb_internal_encoding UTF8: 16.1 milliseconds
mb_stripos with mb_internal_encoding UTF8: 3240.4 milliseconds
mb_substr with mb_internal_encoding UTF8: 23.8 milliseconds
 [2017-07-24 07:00 UTC] spam2 at rhsoft dot net
you need master and not 7.1.9-dev when on the developer mailing list is a discussion ongoing if the mbstring refactoring could be included in 7.2
 [2017-07-24 22:40 UTC] andxfiles at gmail dot com
My bad, had not downloaded the master version, was way down on the page.
I can confirm that this issue is solved in the master branch.

5.3.28:
mb_strlen with parameter UTF8: 152.1 milliseconds
mb_stripos with parameter UTF8: 1810.2 milliseconds
mb_substr with parameter UTF8: 161.8 milliseconds
mb_strlen with mb_internal_encoding UTF8: 24 milliseconds
mb_stripos with mb_internal_encoding UTF8: 1800.8 milliseconds
mb_substr with mb_internal_encoding UTF8: 31.9 milliseconds

7.1.1:
mb_strlen with parameter UTF8: 288.6 milliseconds
mb_stripos with parameter UTF8: 3242 milliseconds
mb_substr with parameter UTF8: 295.2 milliseconds
mb_strlen with mb_internal_encoding UTF8: 14.7 milliseconds
mb_stripos with mb_internal_encoding UTF8: 3251.9 milliseconds
mb_substr with mb_internal_encoding UTF8: 21 milliseconds

7.3.0-dev (master)
mb_strlen with parameter UTF8: 30.7 milliseconds
mb_stripos with parameter UTF8: 139.3 milliseconds
mb_substr with parameter UTF8: 41.6 milliseconds
mb_strlen with mb_internal_encoding UTF8: 12.8 milliseconds
mb_stripos with mb_internal_encoding UTF8: 138.8 milliseconds
mb_substr with mb_internal_encoding UTF8: 22 milliseconds


Thank you for the quick reply and very quick fix. Hopefully we will see this soon on the next version!
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Wed Apr 24 23:01:34 2024 UTC