|  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81554 RecursiveIteratorIterator still calls ->getChildren() when depth reaches limit
Submitted: 2021-10-25 15:39 UTC Modified: 2021-11-02 12:21 UTC
Avg. Score:4.7 ± 0.5
Reproduced:5 of 6 (83.3%)
Same Version:4 (80.0%)
Same OS:5 (100.0%)
From: dktapps at pmmp dot io Assigned: cmb (profile)
Status: Wont fix Package: SPL related
PHP Version: 8.0.12 OS: Windows
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2021-10-25 15:39 UTC] dktapps at pmmp dot io
When a RecursiveIteratorIterator's depth reaches the limit, it still may call its sub-iterator's getChildren(). This manifests as performance degradation when using the below script on a directory with many thousands of files in it.

This can be observed by replacing the iterators with a `FilesystemIterator`, which by default won't recurse anyway. As a result, it's 2 orders of magnitude faster than a RecursiveIteratorIterator with depth 0.

With the target folder containing 30k files (NTFS on a PCIe Gen4 SSD):
- RecursiveDirectoryIterator + maxDepth(0) takes 3.7 seconds
- FilesystemIterator takes 0.03 seconds.

This is most observable on Windows due to Windows' abysmal I/O performance.

Test script:
Slow script:

$iterator = new RecursiveDirectoryIterator(sys_get_temp_dir() . '/phpstan/cache/nette.configurator');
$iterator2 = new RecursiveIteratorIterator($iterator);
$start = hrtime(true);
foreach($iterator2 as $item){

var_dump(number_format(hrtime(true) - $start));

Fast script:

$iterator2 = new FilesystemIterator(sys_get_temp_dir() . '/phpstan/cache/nette.configurator');
$start = hrtime(true);
foreach($iterator2 as $item){

var_dump(number_format(hrtime(true) - $start));

Expected result:
The two scripts should be somewhere in the same order of magnitude of performance.

Actual result:
The fast script is more than 100x faster than the slow one.


Add a Patch

Pull Requests

Add a Pull Request


AllCommentsChangesGit/SVN commitsRelated reports
 [2021-10-28 14:51 UTC]
-Status: Open +Status: Feedback -Package: *Directory/Filesystem functions +Package: SPL related -Assigned To: +Assigned To: cmb
 [2021-10-28 14:51 UTC]
Actually, the inner iterator's ::hasChildren() is called (not
::getChildren)[1].  This is necessary even when the max depth is
reached to determine whether to include the item in the iteration
(if it has no children, and ::LEAVES_ONLY is set) or not.
Unfortunately, RecursiveDirectoryIterator::hasChildren() may cause
up to two stat calls, and these are indeed particularly slow on

The only possible optimization I see would be not to call
::hasChildren() if ::LEAVES_ONLY is not set, but that wouldn't
help in your case, and in many other cases since ::LEAVES_ONLY is
the default and likely used most of the time.

Do you agree that this edge-case is not worth optimizing?

[1] <>
 [2021-10-28 16:01 UTC] dktapps at pmmp dot io
-Status: Feedback +Status: Assigned
 [2021-10-28 16:01 UTC] dktapps at pmmp dot io
I guess you're right. I originally discovered this issue in PHPStan, which uses Symfony/Finder to discover dead files in its cache, which is zero levels deep and only involves files. But I don't think Symfony uses LEAVES_ONLY for this (some complicated FilterIterator), so I guess it wouldn't see any benefit anyway.
 [2021-11-02 12:21 UTC]
-Status: Assigned +Status: Wont fix
 [2021-11-02 12:21 UTC]
Okay, I'm closing as WONTFIX (for the most part this is not a bug,
but since some edge-cases could be improved …)
PHP Copyright © 2001-2023 The PHP Group
All rights reserved.
Last updated: Sun Dec 10 11:01:26 2023 UTC