php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #73887 Improve realpath cache eviction strategy
Submitted: 2017-01-07 13:45 UTC Modified: 2017-07-26 10:26 UTC
Votes:2
Avg. Score:3.5 ± 1.5
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: maggus dot staab at gmail dot com Assigned:
Status: Open Package: Performance problem
PHP Version: any OS:
Private report: No CVE-ID: None
View Add Comment Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
You can add a comment by following this link or if you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: maggus dot staab at gmail dot com
New email:
PHP Version: OS:

 

 [2017-01-07 13:45 UTC] maggus dot staab at gmail dot com
Description:
------------
We recently changed the default realpath cache size as we realized that the default size is too small, see https://github.com/php/php-src/pull/2271.

After a closer look into the realpath cache impl krakjoe and bwoebi mentioned, that cache eviction of this cache is very suboptiomal. Cache items are only evicted after a find. Entries never get evicted when no longer used.
Php is forced todo some expensive file-io operations when the realpath cache cannot deliver the information from memory. This is especially expensive for long running php scripts, like cli based processes/webservers/...

A first improvement would be some kind garbage collection, which isnt bound to a previous find.
2nd the cache could benefit from some kind of LRU cache.
3rd krakjoe mentioned that it might be sometimes usefull to share the cache information in shm.

Related discussion: http://chat.stackoverflow.com/transcript/message/34952738#34952738 Starting 5:32 PM


Patches

Add a Patch

Pull Requests

Pull requests:

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-03-07 14:19 UTC] kelunik@php.net
-Status: Open +Status: Assigned -Assigned To: +Assigned To: kelunik
 [2017-03-12 06:57 UTC] kelunik@php.net
-PHP Version: 7.1.0 +PHP Version: any
 [2017-07-26 09:36 UTC] kelunik@php.net
-Status: Assigned +Status: Open -Assigned To: kelunik +Assigned To:
 [2017-07-26 10:17 UTC] spam2 at rhsoft dot net
first it would be much more important to have the cache shared between workers and so fix the split-brain and overhead of maintaining a cache for every process with a very bad hitrate on mahines with hundrets of workers

see https://bugs.php.net/bug.php?id=73888 - "We recently changed the default realpath cache" is a big mistake as long as verey single worker has it's own cache with that size and clearstatcache() after change a file don't work at all with the current design
 [2017-07-26 10:26 UTC] maggus dot staab at gmail dot com
adding a shared cache would also imply a locking/concurrency mechanism which could also be less performant and more complex in comparision what we have today.
 [2017-07-26 10:56 UTC] spam2 at rhsoft dot net
which performance do we really have?
please read the bugreport i linked

when i make 4 calls to a website and get that numbers below it's all said about the hitrate in real life and how high it could be - it needs a lot of requests to fill all that cache instances and after they are filled 'MaxRequestsPerChild' comes and kills the process, starts a new one and it's cache is empty again - so what *real world* performance are we talking about?

clearstatcache(true, $path) is completly worthless because it resets the cache on one of propably many hundret workers which makes the whole cache dangerous while without that issue you could set "realpath_cache_ttl" to hours instead short values

with 4 MB realpath_cache_size you have probably 1000*4 MB in case of 1000 workers, watsing 4 GB RAM, starting oom-killer or crashing the whole machine instead any benefit - so in summary the probably needed locks are the smallest problem at all

echo count(realpath_cache_get());
325
6
57
1692
 [2017-07-26 11:15 UTC] spam2 at rhsoft dot net
someone pointed out offlist that this maybe is not clear:

put a file on a large and busy server with that line
<?php echo count(realpath_cache_get());?>

call that URL 4 times and you get that numbers below 
325
6
57
1692

you can hardly tell me to get a useful hitrate with such randomness all over workers
 [2018-07-31 12:26 UTC] spam2 at rhsoft dot net
anyone considered the design flaw described in https://bugs.php.net/bug.php?id=73888 get fixed?

re-base the patch for current master to enable it at all beause you guys think it's amrt hardcode that two times instead give the admin a option or at least compile-option to state "we don't allow link/symlink anyways so the cache is safe even with open_basedir" was a reminder taht it has a ton of problems

/* Disable realpath cache if an open_basedir is set */
/*if (PG(open_basedir) && *PG(open_basedir)) { */
/*	CWDG(realpath_cache_size_limit) = 0; */
/*}*/
 [2018-08-23 13:31 UTC] 0 at 0 dot com
a
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 12:01:27 2024 UTC