php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #63689 Caching by hash not path
Submitted: 2012-12-04 18:20 UTC Modified: 2012-12-05 07:52 UTC
From: marrtins at hackers dot lv Assigned:
Status: Wont fix Package: APC (PECL)
PHP Version: 5.3.19 OS: Linux
Private report: No CVE-ID: None
 [2012-12-04 18:20 UTC] marrtins at hackers dot lv
Description:
------------
I got one system where is quite a lots of wordpress installations. There are very many identical files but with different paths. Would be nice get them hashed and stored by this hash key instead of path.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-12-04 18:26 UTC] rasmus@php.net
APC doesn't actually use the path as the key. It uses the inode. Hashing the 
entire file on every request would be prohibitively expensive.
 [2012-12-04 18:26 UTC] rasmus@php.net
-Status: Open +Status: Wont fix
 [2012-12-04 18:35 UTC] marrtins at hackers dot lv
You don't need hashing on every request. How about calculating hash only when first time accessing inode (or when stat changes)? This way it would be possible to keep track of duplicates.
 [2012-12-04 18:39 UTC] rasmus@php.net
But you are asking to cache by hash. If the hash is the key it has to be 
calculated every time. If the hash is not the key, then I don't really see the 
point. If the key is still the inode I will still need a cache entry for each 
inode and then you haven't saved anything.
 [2012-12-04 18:46 UTC] marrtins at hackers dot lv
Yes I have difficulties to pass an opinion and I am not that familiar with inner works of APC, but for duplicated content in second inode you could use a simple pointer to cached slot for first inode. Hash each file/inode only once accessed (or stat changed) and comparing with hash of new arrived file/inode. If hash matches store pointer of already hashed slot not whole file.
 [2012-12-04 19:31 UTC] rasmus@php.net
Right, so what you are really asking for is a symlink implementation at the APC 
level even though you have a perfectly good symlink implementation at the 
filesystem level.
 [2012-12-05 07:52 UTC] gopalv@php.net
The simplest approach is the most low level one.

If you have a large number of identical files, make sure that they are all hard 
links to the original copy.

If you're on ubuntu, the apt-get install hardlink should get you an easy utility 
to de-dup files within /var/ - but be warned, all duplicate files might not be 
created equal, some of them might be edited in the future.

People have created so-called "cowlink" files to work around that problem, but 
I'm not really sure it's a clean approach (http://lwn.net/Articles/75232/).

Short of that, you could just identify the identical files and pass them through 
a ln -f command against a standard wordpress install.
 [2012-12-05 08:15 UTC] marrtins at hackers dot lv
Yes, I was conisdering hard/soft links but changing natature for these installations won't be such a good idea. For example - one user may suddenly upgrade his wordpress installation. There has to be a script that runs frequently and updates/deletes/adds links. Ownership and permission management in this situation won't be that easy also.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 14:01:32 2024 UTC