PHP :: Request #63689 :: Caching by hash not path

Request #63689	Caching by hash not path
Submitted:	2012-12-04 18:20 UTC	Modified:	2012-12-05 07:52 UTC
From:	marrtins at hackers dot lv	Assigned:
Status:	Wont fix	Package:	APC (PECL)
PHP Version:	5.3.19	OS:	Linux
Private report:	No	CVE-ID:	None

View Developer Edit

[2012-12-04 18:20 UTC] marrtins at hackers dot lv

Description:
------------
I got one system where is quite a lots of wordpress installations. There are very many identical files but with different paths. Would be nice get them hashed and stored by this hash key instead of path.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2012-12-04 18:26 UTC] rasmus@php.net

APC doesn't actually use the path as the key. It uses the inode. Hashing the 
entire file on every request would be prohibitively expensive.

[2012-12-04 18:26 UTC] rasmus@php.net

-Status: Open +Status: Wont fix

[2012-12-04 18:35 UTC] marrtins at hackers dot lv

You don't need hashing on every request. How about calculating hash only when first time accessing inode (or when stat changes)? This way it would be possible to keep track of duplicates.

[2012-12-04 18:39 UTC] rasmus@php.net

But you are asking to cache by hash. If the hash is the key it has to be 
calculated every time. If the hash is not the key, then I don't really see the 
point. If the key is still the inode I will still need a cache entry for each 
inode and then you haven't saved anything.

[2012-12-04 18:46 UTC] marrtins at hackers dot lv

Yes I have difficulties to pass an opinion and I am not that familiar with inner works of APC, but for duplicated content in second inode you could use a simple pointer to cached slot for first inode. Hash each file/inode only once accessed (or stat changed) and comparing with hash of new arrived file/inode. If hash matches store pointer of already hashed slot not whole file.

[2012-12-04 19:31 UTC] rasmus@php.net

Right, so what you are really asking for is a symlink implementation at the APC 
level even though you have a perfectly good symlink implementation at the 
filesystem level.

[2012-12-05 07:52 UTC] gopalv@php.net

The simplest approach is the most low level one.

If you have a large number of identical files, make sure that they are all hard 
links to the original copy.

If you're on ubuntu, the apt-get install hardlink should get you an easy utility 
to de-dup files within /var/ - but be warned, all duplicate files might not be 
created equal, some of them might be edited in the future.

People have created so-called "cowlink" files to work around that problem, but 
I'm not really sure it's a clean approach (http://lwn.net/Articles/75232/).

Short of that, you could just identify the identical files and pass them through 
a ln -f command against a standard wordpress install.

[2012-12-05 08:15 UTC] marrtins at hackers dot lv

Yes, I was conisdering hard/soft links but changing natature for these installations won't be such a good idea. For example - one user may suddenly upgrade his wordpress installation. There has to be a script that runs frequently and updates/deletes/adds links. Ownership and permission management in this situation won't be that easy also.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Sun Jul 06 02:01:36 2025 UTC