php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #58212 Issues with hard links
Submitted: 2008-06-03 10:53 UTC Modified: 2011-04-07 10:57 UTC
Votes:3
Avg. Score:5.0 ± 0.0
Reproduced:2 of 2 (100.0%)
Same Version:0 (0.0%)
Same OS:0 (0.0%)
From: rpeters at icomproductions dot ca Assigned:
Status: Wont fix Package: APC (PECL)
PHP Version: 5.2.2 OS: CentOS 4.3
Private report: No CVE-ID: None
 [2008-06-03 10:53 UTC] rpeters at icomproductions dot ca
Description:
------------
I posted this as a comment in a closed bug, but it may not be the same as the original, and may not get enough attention as just a comment, so I'm posting it as a new bug here.

The following code behaves differently with APC installed.

Reproduce code:
---------------
/1/test.php:
<?php
require(dirname(__FILE__) . '/testDest.php');
?>

/2/test.php created by "cp -l ../1/test.php"

/1/testDest.php:
<?php
echo 1;
?>

/2/testDest.php:
<?php
echo 2;
?>

Expected result:
----------------
/1/test.php displays "1", and /2/test.php displays "2"

Actual result:
--------------
Both scripts will return the results of whichever one you hit first.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-07-02 11:56 UTC] rpeters at icomproductions dot ca
This is completely preventing us from using APC on our production server. Is it possible to get an estimated fix date for this, or at least an indication that it is being worked on?
 [2008-07-02 12:42 UTC] shire@php.net
__FILE__ is resolved at compile-time so it's "hard-coded" into the compiled code that's being cached.  If you replace __FILE__ with something that happens during execution such as a function call to determine your current filename or a $_SERVER variable this should work as you expect.  Because it can sometimes be useful to have __FILE__ work in this way, it's best if you determine your path another way in this use case.

Please re-open if this if these other options also behave in the same way.
 [2008-07-02 12:56 UTC] rpeters at icomproductions dot ca
I'm surprized and dissapointed that you are satisfied that APC alters the intended behavior of PHP, and that I have to change our code in order to have it function in an APC environment.

That said, I am curious what portable runtime equivilants there are to __FILE__, since it seems to be the only recommended way to avoid PHP's relative-path include issues.

The only other method I know of is $_SERVER['SCRIPT_FILENAME'], but this is apache-only.
 [2008-07-02 13:11 UTC] rasmus@php.net
You understand what an opcode cache does, right?  It caches the result of the compile stage in shared memory to avoid having to recompile on each request.  Things that are resolved at compile-time, like __FILE__ will therefore by definition be cached, and if you are going to use an opcode cache you have to be somewhat aware of the difference between the compile and execute stages.
 [2008-07-02 13:33 UTC] rpeters at icomproductions dot ca
I understand there may be difficulty correcting this, and that it may result in additional work being done, by adding additional criteria to what's considered the "same" file (eg: inode && __FILE__ match, rather than just inode), or perhaps even requiring parsing and altering the file before running it through the compiler, but that really isn't the issue here. No offence meant, but finding an efficient way to fix it is your job, not mine.

The issue I have with marking this as Wont Fix is twofold:
1) There appears to be no work-around. I have yet to find something that reliably works on all web servers that replaces __FILE__. Every time there's something similar, they always point to __FILE__ as the definitive way of finding the full filesystem path of the currently running file. Please enlighten me if there is a method I have missed.
2) This alters the expected behavior of PHP. APC is a _cache_, and should as part of the principle, under no circumstances alter _behavior_.
 [2008-07-02 13:46 UTC] rasmus@php.net
It is an opcode cache.  It will run the cached opcodes identically each and every time.  Defeating the compiler and moving things into the executor in order to mimic the behaviour of per-request compilation exactly would defeat the purpose of the opcode cache.

Why can't you just use PHP_SELF, by the way?
 [2008-07-02 14:06 UTC] rpeters at icomproductions dot ca
Ultimately, this could be described as a PHP bug. __FILE__ should be a run-time variable, and not a compile-time constant, due to the possibility of hard-linking. But until PHP corrects this, I would expect APC to include a work-around, since it is the module introducing the circumstances where the compile-time and run-time values can differ.

$_SERVER['PHP_SELF'] does not include full path information, and also does include virtual directories. For both those reasons I believe it is therefore not applicable for require();

In production, I have a file prepend.inc.php that is included from every file in the system. This means the working directory is effectively random. It needs to include configuration.inc.php, located in the same directory as prepend.inc.php.

The only way I have found to discover the containing directory of prepend.inc.php from within prepend.inc.php is dirname(__FILE__).

This is a very common and widely accepted approach to this type of problem, but mixing it with hard-links appears to be quite rare, or you would have seen many other bug reports of this in the past.

By focusing on the pre-parsing of the file, I'm assuming there is a reason to also discard my suggestion of adding a simple filename check in addition to the inode number check? Perhaps making this optional, to maintain performance for those who don't need it?
 [2008-07-02 14:44 UTC] rpeters at icomproductions dot ca
I would like to note that PHP 5.3.0 has also introduced __DIR__, which is the equivalent of dirname(__FILE__), for precisely this purpose. So I expect that you will see more people encountering this issue in the future, as developers adopt this officially recommended method of finding absolute file-system paths.
 [2008-07-02 16:04 UTC] rpeters at icomproductions dot ca
Looking at similar bug reports for other cache systems, it appears that this bug also rears its head when renaming already cached files, which makes sense, and may be more common than hard-linking.
 [2008-07-02 16:30 UTC] shire@php.net
__FILE__ and like magic constants are intended to be just this, compile time constants, so I don't think they can be considered a PHP bug simply because there may be cases where users might want them to function like PHP variables.  There's optimizations users can make when not doing hard-links and such that take advantage of this characteristic.

While it's true that $_SERVER['PHP_SELF] is not a full path it can easily be resolved using something like:

<? realpath($_SERVER['PHP_SELF']); ?>
 [2008-07-02 16:41 UTC] rpeters at icomproductions dot ca
PHP documents __FILE__ as "The full path and filename of the file. If used inside an include, the name of the included file is returned", so it failing to do so would be fair to consider either a bug, or bad documentation, you pick. It's also the only magic "constant" to maintain it's value throughout execution of a script, so saying that it's _expected_ to be set at compile time isn't fair either. Regardless, no matter what classification this has, it needs a workaround.

I still have yet to find one.
A quick test shows that $_SERVER['PHP_SELF'] reflects the executed script, and not the included filename. realpath($_SERVER['PHP_SELF']) actually returns null in my dev setup, and I'm not sure why.
 [2008-07-02 16:48 UTC] rasmus@php.net
Switch to statless mode then.  apc.stat=0
That should provide the workaround you need since no inodes will be involved anymore.
 [2008-07-02 16:50 UTC] rpeters at icomproductions dot ca
My understanding is that with stat set to 0, "You will have to restart the server if you change anything."

That is not an acceptable work-around, since this is a production server, and should never be restarted.
 [2008-07-02 16:52 UTC] rasmus@php.net
Or clear the cache.  If it is a production server, you shouldn't be modifying files on it anyway, only doing controlled pushes to it at which point you either programmatically clear the cache or restart the server.
 [2008-07-02 16:56 UTC] rpeters at icomproductions dot ca
The manual also states "note that if you are using relative path includes APC has to check in order to uniquely identify the file."

Which leads me to believe my relative-path inclusion of prepend.inc.php will still cause the use of the inode to identify the file. Please let me know if this is not the case.

Trying to include prepend using absolute paths brings us right back to the original problem of finding the absolute path.
 [2008-07-02 16:56 UTC] rasmus@php.net
The other option is of course to not cache the prepend file using apc.filters.  So, plenty of workarounds.  We will not be changing __FILE__ nor __DIR__ in PHP to be execute-time, and we definitely will not be doing any sort of APC-level hack in PHP to make these execute-time either.
 [2008-07-02 17:14 UTC] rpeters at icomproductions dot ca
I did a quick test on my development system, and it appears that stat=0 will work.

I'm still not pleased that I'm forced to manually refresh the cache when files change, and still think this is something you should be addressing, but the additional overhead this will cause us should still pay off in performance at this point.
 [2008-07-02 17:25 UTC] rpeters at icomproductions dot ca
apc.filters seems to be a better work-around, but being filename-specific, is far from a general solution.

Either way, these appear to address my immediate concern regarding this, thank you.

I still find the resistance to fixing this issue hard to comprehend though, and have yet to hear that checking the filename in addition to the inode is not an easy fix. (In fact, I'm unclear on why the inode was ever chosen over just the filename in the first place, but that's rather irrelevant.)
 [2008-07-02 17:53 UTC] rpeters at icomproductions dot ca
Since there's such resistance to doing anything to APC to rectify this, and no one has produced a runtime alternative to __FILE__ to get around this issue, I have filed a feature request for PHP to introduce one.

http://bugs.php.net/bug.php?id=45421
 [2008-07-02 18:03 UTC] rpeters at icomproductions dot ca
Sorry for continuing to push this issue, but it really does bother me.
I found the reason for using inode instead of full path, and so have come up with the following proposed solution:
Since stat() also returns nlink, (optionally) adding a full filename check is reasonable if nlink > 2, since then __FILE__ and the like could be reasonably assumed to be different.
 [2008-07-02 18:09 UTC] rpeters at icomproductions dot ca
Of course, I meant if nlink >= 2
 [2009-06-23 07:19 UTC] m dot moeller at bigpoint dot net
same issue here.

we have different clones of applications, using a hardlink-instead-of-copy method, which renders apc absolutely unusable.

in my understanding, apc should use the normalized path of the PHP file instead of the stat inode/dev combination, as inode+dev can refer to multiple filenames, refering to multiple __FILE__ constants.
 [2009-06-23 07:25 UTC] m dot moeller at bigpoint dot net
looking at APC's source code, i can see that apc.stat = 0 triggers caching by full path (APC_CACHE_KEY_FPFILE), while apc.stat = 1 uses inode/dev combination.

This behaviour of apc.stat has absolutely nothing to do with it's description!

It would make sense to add a configuration variable apc.filekey=[fullpath|inode] for controlling this behaviour, and leaving apc.stat to do what it is described to do.
 [2009-06-23 08:35 UTC] gopalv82 at yahoo dot com
/opt/php53/bin/php -dvld.active /tmp/x.php

line     #  op                           fetch          ext  return  operands
-------------------------------------------------------------------------------
   3     0  ECHO                                                     '%2Ftmp%2Fx.php'

Basically there is no function or any sort of dynamic lookup for __FILE__. 

The Zend compiler does an in-place replace with a string.

I'm a little ticked off too that the "dynamic" constant  thing has been marked as bogus. Let me think a little harder about it.
 [2009-06-23 08:39 UTC] gopalv82 at yahoo dot com
And as far as the stat=1 goes. The reason stat=0 uses a full path is because the to get inode+dev is exactly what a stat() is used for. The fullpath hash is trying to avoid that extra syscall.

To be more clear, in the beginning we had just inode+dev keys, we threw in the full path keys to avoid the stat() call.

With a full path key, the hard links look like different files to the apc cache and they are compiled & cached independently. 

So every one of those independent opcode streams have their own filename as a constant string. 

But with hardlinks and stat=0, you waste cache space by compiling & caching the same file (in reality) multiple times just to change the __FILE__ value in there.
 [2009-06-23 10:26 UTC] rasmus@php.net
In simpler terms, stat=0 removes the stat call which means we don't get the inode+dev anymore and have to use something else as the cache key.  So, this is correctly named because getting rid of the stat is the main thing it does, and changing the nature of the key is a side effect.
 [2009-08-19 13:24 UTC] rpeters at icomproductions dot ca
Anyone else that thinks it's important that opcode caches should be able to properly support __FILE__ (or an equivilant), please add your support (even just votes) to http://bugs.php.net/bug.php?id=45421
 [2009-08-19 13:25 UTC] rpeters at icomproductions dot ca
Also, has there been any further thought on optimizing by allowing a flag to check filename only when nlink >= 2? That would allow us to maintain the speed benefits of stat=1 when not affected.
 [2011-01-24 17:43 UTC] rpeters at icomproductions dot ca
Please note that this is still a problem for us, and I am still unsatisfied with a "Wont fix" status.
 [2011-02-15 04:19 UTC] graham at stormbrew dot ca
I'd like to lend my voice to the fact that this is 
completely unacceptable behaviour. A hard linked file should 
appear to *all* uses of it as if it is a different file, 
except when explicitly investigating its inode or link 
count. Any other behaviour is broken. 

The path of file /path/to/X, hard linked to file /path/to/Y 
is absolutely not /path/to/Y. If part of the encoding of a 
php script is its full path, then two scripts with identical 
inodes in different locations are NOT identical scripts and 
should not be cached as such.

Requiring apc.stat to be off is also an unacceptable 
workaround for many configurations under which php is run in 
the wild.
 [2011-02-15 07:57 UTC] gopalv@php.net
> A hard linked file should appear to *all* uses of it as if 
> it is a different file, except when explicitly 
> investigating its inode or link count. Any other behaviour 
> is broken. 

That's pretty much the problem - apc's internal "unique identifier" is inode#.
 [2011-02-15 12:35 UTC] rasmus@php.net
These "I want the sky to be green" comments here aren't all 
that constructive. We've explained the technical limitations. 
Submit a patch that addresses these limitations if you can, 
otherwise you will have to make appropriate code changes to 
work within what is technically feasible.
 [2011-02-17 11:04 UTC] rpeters at icomproductions dot ca
Please make up your mind. Is this a "Won't fix", at which point there is no reason to submit a patch, or is this a valid bug?

If you admit this is a real problem, can we please give it the respect it deserves and leave it open?
 [2011-02-17 12:37 UTC] rasmus@php.net
It is a "Won't Fix" because of the technical limitations.
 [2011-04-07 03:44 UTC] hanwoody at gmail dot com
Our server is special, for there're many same md5 files in 
our servers, so we deduplicate using hardlink&#65292;BUT apc cannot 
support it. We're not satified with "won't fix" status!!!
 [2011-04-07 10:57 UTC] rasmus@php.net
That's nice. Submit a patch then. I see no clean way of 
fixing this. If you do and your patch makes sense I will 
happily commit it.
 [2011-04-11 21:00 UTC] simpcl2008 at gmail dot com
May I ask a question? 

APC does not check the modify time of php file when it is 
finding the opcode cache with the full path as the key 
(apc.stat = 0). Why?

A problem is that opcode cache can not be updated when the php 
file is modified.
 [2011-04-27 04:58 UTC] simpcl2008 at gmail dot com
1. use following patch:
http://bugs.php.net/patch-display.php?
bug_id=45421&patch=apc_auto_hardlinks_for_php_5.3.5.diff&rev
ision=1303892859&download=1

2. change the apc source code:
--- APC-3.1.6/apc_main.c	2010-11-30 
18:18:31.000000000 +0800
+++ APC-3.1.6-sae/apc_main.c	2011-04-27 
15:56:34.000000000 +0800
@@ -559,6 +559,7 @@ static zend_op_array* my_compile_file(ze
             if (h->type != ZEND_HANDLE_FILENAME) {
                 zend_llist_add_element(&CG(open_files), h); 
             }
+            op_array->filename = estrdup(filename);
             return op_array;
         }
         if(APCG(report_autofilter)) {

3. replace the __FILE__ with executed_filename

Hope this can fix your apc hardlinks program.
 [2012-04-23 03:36 UTC] wl at soplwang dot com
That's APC's cache problem, not PHP's...

If turn apc.stat = 0, APC will not use inode..., So, will hope when apc.stat = 1, 
APC still not depend on inode...
 [2014-01-11 00:37 UTC] ashutosh108 at gmail dot com
I'm also suffering from this, and yes, I think this is a bug in APC. It could be fixed, as others pointed out in comments, by using not just inode number, but both inode number and full path, as the key for cached file. I believe forcing people to use "apc.stat=0" is unacceptable workaround.

Reproduced on PHP 5.3.3, APC 3.1.3p1-2 (Debian squeeze).

I've carefully examined all comments posted here and I insist this is a genuine APC bug. Please reopen and fix it.
 [2015-07-07 09:55 UTC] andrey at optimuspro dot ru
Just faced this bug. 
>We've explained the technical limitations. 
I've re-read the whole thread and I've understood the following:

1. In 'stat=1' mode APC uses inode/dev combination to unique identify each file. It requires extra stat() call...
2. To improve performance the 'stat=0' mode was introduced. The side-effect of such modification was ruquirement to use something else than inode/dev combination  as the cache key, since it became unavailable. The file path was choosen to be the cahce key in this situation.

So it describes why 'stat=0' uses file path as a key, but doesn't give any reason why this behaviour can not be extended back to the 'stat=1' mode, espceially since the file path approach has shown itself as more universal in the light of occured issues with hard links and other cases, mentioned here (the only reason to use inode/dev provided is that it was this way from the begining). 

Regarding waste of cahce space to cache 'actually the same file, located in different places' - well, cases with __FILE__ have proven, that in PHP world the statement about 'the same file' is not true - if script behaviour changes depend on it's location, then it two scripts in the different places are not the same, even if their content is identical...

I will most likely be ok with the 'stat=0' mode, since we use hard links on our staging server only to quickly instantiate clones for testing, but good explanation of reasons will be good to close this question once and for all.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 14:01:32 2024 UTC