php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81554 RecursiveIteratorIterator still calls ->getChildren() when depth reaches limit
Submitted: 2021-10-25 15:39 UTC Modified: 2021-11-02 12:21 UTC
Votes:6
Avg. Score:4.7 ± 0.5
Reproduced:5 of 6 (83.3%)
Same Version:4 (80.0%)
Same OS:5 (100.0%)
From: dktapps at pmmp dot io Assigned: cmb (profile)
Status: Wont fix Package: SPL related
PHP Version: 8.0.12 OS: Windows
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: dktapps at pmmp dot io
New email:
PHP Version: OS:

 

 [2021-10-25 15:39 UTC] dktapps at pmmp dot io
Description:
------------
When a RecursiveIteratorIterator's depth reaches the limit, it still may call its sub-iterator's getChildren(). This manifests as performance degradation when using the below script on a directory with many thousands of files in it.

This can be observed by replacing the iterators with a `FilesystemIterator`, which by default won't recurse anyway. As a result, it's 2 orders of magnitude faster than a RecursiveIteratorIterator with depth 0.

With the target folder containing 30k files (NTFS on a PCIe Gen4 SSD):
- RecursiveDirectoryIterator + maxDepth(0) takes 3.7 seconds
- FilesystemIterator takes 0.03 seconds.

This is most observable on Windows due to Windows' abysmal I/O performance.

Test script:
---------------
Slow script:
<?php


$iterator = new RecursiveDirectoryIterator(sys_get_temp_dir() . '/phpstan/cache/nette.configurator');
$iterator2 = new RecursiveIteratorIterator($iterator);
$iterator2->setMaxDepth(0);
$start = hrtime(true);
foreach($iterator2 as $item){

}
var_dump(number_format(hrtime(true) - $start));

--------
Fast script:
<?php

$iterator2 = new FilesystemIterator(sys_get_temp_dir() . '/phpstan/cache/nette.configurator');
$start = hrtime(true);
foreach($iterator2 as $item){

}
var_dump(number_format(hrtime(true) - $start));


Expected result:
----------------
The two scripts should be somewhere in the same order of magnitude of performance.

Actual result:
--------------
The fast script is more than 100x faster than the slow one.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-10-28 14:51 UTC] cmb@php.net
-Status: Open +Status: Feedback -Package: *Directory/Filesystem functions +Package: SPL related -Assigned To: +Assigned To: cmb
 [2021-10-28 14:51 UTC] cmb@php.net
Actually, the inner iterator's ::hasChildren() is called (not
::getChildren)[1].  This is necessary even when the max depth is
reached to determine whether to include the item in the iteration
(if it has no children, and ::LEAVES_ONLY is set) or not.
Unfortunately, RecursiveDirectoryIterator::hasChildren() may cause
up to two stat calls, and these are indeed particularly slow on
Windows.

The only possible optimization I see would be not to call
::hasChildren() if ::LEAVES_ONLY is not set, but that wouldn't
help in your case, and in many other cases since ::LEAVES_ONLY is
the default and likely used most of the time.

Do you agree that this edge-case is not worth optimizing?

[1] <https://github.com/php/php-src/blob/php-7.4.25/ext/spl/spl_iterators.c#L266-L317>
 [2021-10-28 16:01 UTC] dktapps at pmmp dot io
-Status: Feedback +Status: Assigned
 [2021-10-28 16:01 UTC] dktapps at pmmp dot io
I guess you're right. I originally discovered this issue in PHPStan, which uses Symfony/Finder to discover dead files in its cache, which is zero levels deep and only involves files. But I don't think Symfony uses LEAVES_ONLY for this (some complicated FilterIterator), so I guess it wouldn't see any benefit anyway.
 [2021-11-02 12:21 UTC] cmb@php.net
-Status: Assigned +Status: Wont fix
 [2021-11-02 12:21 UTC] cmb@php.net
Okay, I'm closing as WONTFIX (for the most part this is not a bug,
but since some edge-cases could be improved …)
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 11:01:27 2024 UTC