php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81554 RecursiveIteratorIterator still calls ->getChildren() when depth reaches limit
Submitted: 2021-10-25 15:39 UTC Modified: 2021-11-02 12:21 UTC
Votes:6
Avg. Score:4.7 ± 0.5
Reproduced:5 of 6 (83.3%)
Same Version:4 (80.0%)
Same OS:5 (100.0%)
From: dktapps at pmmp dot io Assigned: cmb (profile)
Status: Wont fix Package: SPL related
PHP Version: 8.0.12 OS: Windows
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: dktapps at pmmp dot io
New email:
PHP Version: OS:

 

 [2021-10-25 15:39 UTC] dktapps at pmmp dot io
Description:
------------
When a RecursiveIteratorIterator's depth reaches the limit, it still may call its sub-iterator's getChildren(). This manifests as performance degradation when using the below script on a directory with many thousands of files in it.

This can be observed by replacing the iterators with a `FilesystemIterator`, which by default won't recurse anyway. As a result, it's 2 orders of magnitude faster than a RecursiveIteratorIterator with depth 0.

With the target folder containing 30k files (NTFS on a PCIe Gen4 SSD):
- RecursiveDirectoryIterator + maxDepth(0) takes 3.7 seconds
- FilesystemIterator takes 0.03 seconds.

This is most observable on Windows due to Windows' abysmal I/O performance.

Test script:
---------------
Slow script:
<?php


$iterator = new RecursiveDirectoryIterator(sys_get_temp_dir() . '/phpstan/cache/nette.configurator');
$iterator2 = new RecursiveIteratorIterator($iterator);
$iterator2->setMaxDepth(0);
$start = hrtime(true);
foreach($iterator2 as $item){

}
var_dump(number_format(hrtime(true) - $start));

--------
Fast script:
<?php

$iterator2 = new FilesystemIterator(sys_get_temp_dir() . '/phpstan/cache/nette.configurator');
$start = hrtime(true);
foreach($iterator2 as $item){

}
var_dump(number_format(hrtime(true) - $start));


Expected result:
----------------
The two scripts should be somewhere in the same order of magnitude of performance.

Actual result:
--------------
The fast script is more than 100x faster than the slow one.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-10-28 14:51 UTC] cmb@php.net
-Status: Open +Status: Feedback -Package: *Directory/Filesystem functions +Package: SPL related -Assigned To: +Assigned To: cmb
 [2021-10-28 14:51 UTC] cmb@php.net
Actually, the inner iterator's ::hasChildren() is called (not
::getChildren)[1].  This is necessary even when the max depth is
reached to determine whether to include the item in the iteration
(if it has no children, and ::LEAVES_ONLY is set) or not.
Unfortunately, RecursiveDirectoryIterator::hasChildren() may cause
up to two stat calls, and these are indeed particularly slow on
Windows.

The only possible optimization I see would be not to call
::hasChildren() if ::LEAVES_ONLY is not set, but that wouldn't
help in your case, and in many other cases since ::LEAVES_ONLY is
the default and likely used most of the time.

Do you agree that this edge-case is not worth optimizing?

[1] <https://github.com/php/php-src/blob/php-7.4.25/ext/spl/spl_iterators.c#L266-L317>
 [2021-10-28 16:01 UTC] dktapps at pmmp dot io
-Status: Feedback +Status: Assigned
 [2021-10-28 16:01 UTC] dktapps at pmmp dot io
I guess you're right. I originally discovered this issue in PHPStan, which uses Symfony/Finder to discover dead files in its cache, which is zero levels deep and only involves files. But I don't think Symfony uses LEAVES_ONLY for this (some complicated FilterIterator), so I guess it wouldn't see any benefit anyway.
 [2021-11-02 12:21 UTC] cmb@php.net
-Status: Assigned +Status: Wont fix
 [2021-11-02 12:21 UTC] cmb@php.net
Okay, I'm closing as WONTFIX (for the most part this is not a bug,
but since some edge-cases could be improved …)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 16:01:28 2024 UTC