php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #77556 COUNT_RECURSIVE is suprisingly slow
Submitted: 2019-02-01 14:52 UTC Modified: 2019-02-01 14:58 UTC
From: ivan dot enderlin at hoa-project dot net Assigned:
Status: Not a bug Package: Performance problem
PHP Version: 7.3.1 OS: macOS
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If this is not your bug, you can add a comment by following this link.
If this is your bug, but you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: ivan dot enderlin at hoa-project dot net
New email:
PHP Version: OS:

 

 [2019-02-01 14:52 UTC] ivan dot enderlin at hoa-project dot net
Description:
------------
I discovered `COUNT_RECURSIVE` and thought it would speed up my algorithm. But instead, it slowed it down. It initially used `array_sum` + `array_map`. See the test script.

This result is surprising because `array_map` creates a new array, so the memory allocations should be slower than just iterating and counting. It proofs to be true with the last test in the script (see “manual”) where it is faster than all the previous solutions.

Test script:
---------------
<?php

const LENGTH = 100000;

$a = array_fill(0, LENGTH, array_fill(0, LENGTH, 42));

echo 'expected:            '; var_dump(LENGTH * LENGTH);
echo "\n\n";

$_ = microtime(true);
echo 'count recursive:     '; var_dump(count($a, COUNT_RECURSIVE) - count($a));
echo 'time: ', number_format((microtime(true) - $_) / 1000, 6), "\n\n";

$_ = microtime(true);
echo 'sum + map:           '; var_dump(array_sum(array_map('count', $a)));
echo 'time: ', number_format((microtime(true) - $_) / 1000, 6), "\n\n";

$_ = microtime(true);
echo 'sum + map + closure: '; var_dump(array_sum(array_map(function (array $v) { return count($v); }, $a)));
echo 'time: ', number_format((microtime(true) - $_) / 1000, 6), "\n\n";

$_ = microtime(true);
echo 'manual:              ';
$i = 0;
foreach ($a as $aa) {
    $i += count($aa);
}
var_dump($i);
echo 'time: ', number_format((microtime(true) - $_) / 1000, 6), "\n\n";


/**
 * Output:
 *
 *     expected:            int(10000000000)
 *
 *
 *     count recursive:     int(10000000000)
 *     time: 0.012170
 *
 *     sum + map:           int(10000000000)
 *     time: 0.000005
 *
 *     sum + map + closure: int(10000000000)
 *     time: 0.000007
 *
 *     manual:              int(10000000000)
 *     time: 0.000002
 */


Expected result:
----------------
`COUNT_RECURSIVE` should be as fast as or faster than `array_sum` + `array_map`.

`COUNT_RECURSIVE` should be as fast as the “manual” test, or even faster because it's written in C directly.

Actual result:
--------------
`COUNT_RECURSIVE` is way too slow :-).

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-02-01 14:58 UTC] nikic@php.net
-Status: Open +Status: Not a bug
 [2019-02-01 14:58 UTC] nikic@php.net
COUNT_RECURSIVE also needs to iterate over the inner arrays to check whether they contain arrays that also need to be recursively counted. You are using domain knowledge (that arrays are singly-nested) to avoid that extra recursion step, and can thus do this much more efficiently. It's basically 100000 vs 100000^2 operations.
 [2019-02-01 15:13 UTC] ivan dot enderlin at hoa-project dot net
Indeed.

```
function f(array $a): int {
    $i = count($a);

    foreach ($a as $aa) {
        if (is_array($aa)) {
            $i += f($aa);
        }
    }

    return $i;
}
```

is 10 times slower than `COUNT_RECURSIVE` and does exactly the same.

I guess having a type to represent homogeneous arrays would help a lot here.

Thanks for your answer.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Fri Dec 04 02:01:23 2020 UTC