php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43864 unnecessary lstat64
Submitted: 2008-01-16 08:21 UTC Modified: 2009-06-19 19:30 UTC
Votes:15
Avg. Score:4.1 ± 0.9
Reproduced:12 of 12 (100.0%)
Same Version:6 (50.0%)
Same OS:10 (83.3%)
From: sq6elt at wp dot pl Assigned:
Status: Wont fix Package: Performance problem
PHP Version: 5.2.5 OS: Linux
Private report: No CVE-ID: None
 [2008-01-16 08:21 UTC] sq6elt at wp dot pl
Description:
------------
With a lot of includes, located deep in file system,
there may be a performance impact, because of a lot of
unnecessary lstats.
Why, when there is exactly specified path, any lstats are made.
Simple access is sufficient, or I missed something?
If some of these are required, then do it only once.
I have checked php4, it behaves in the same way.
There is no difference, when i use include, include_once, require, require_once.

Reproduce code:
---------------
Create a directory tree:
mkdir -p /tmp/a/b/c/d/e/f/g/h/i/j

Two empty php scripts:
echo '<? ?>' > /tmp/a/b/c/d/e/f/g/h/i/j/a.php
echo '<? ?>' > /tmp/a/b/c/d/e/f/g/h/i/j/b.php

and a main script /tmp/t.php

<?
  include "/tmp/a/b/c/d/e/f/g/h/i/j/a.php";
  include "/tmp/a/b/c/d/e/f/g/h/i/j/b.php";
?>




Actual result:
--------------
Do a strace on t,php
strace /usr/bin/php5 t.php 2>&1 | grep lstat64

And guess what:
lstat64("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/usr/bin", {st_mode=S_IFDIR|0755, st_size=49152, ...}) = 0
lstat64("/usr/bin/php5", {st_mode=S_IFREG|0755, st_size=5510176, ...}) = 0
lstat64("/etc", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0
lstat64("/etc/php5", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/etc/php5/cli", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/etc/php5/cli/php.ini", {st_mode=S_IFREG|0644, st_size=44278, ...}) = 0
lstat64("/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=8192, ...}) = 0
lstat64("/tmp/t.php", {st_mode=S_IFREG|0644, st_size=94, ...}) = 0
lstat64("/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=8192, ...}) = 0
lstat64("/tmp/a", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g/h", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g/h/i", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g/h/i/j", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g/h/i/j/a.php", {st_mode=S_IFREG|0644, st_size=11, ...}) = 0
lstat64("/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=8192, ...}) = 0
lstat64("/tmp/a", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g/h", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g/h/i", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g/h/i/j", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/tmp/a/b/c/d/e/f/g/h/i/j/b.php", {st_mode=S_IFREG|0644, st_size=7, ...}) = 0


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-01-16 17:18 UTC] nlgordon at iastate dot edu
I've seen this same issue, and while I agree that it is excessive considering the realpath cache.  It appears to be a byproduct of using realpath on pretty much all file accesses for includes.

Theoretically the realpath cache could be extended to cover directories and the like.  I know I would like it, my servers get hit hard because I'm serving php out of AFS space.
 [2009-06-18 09:35 UTC] sq6elt at wp dot pl
From: TSRM/tsrm_virtual_cwd.c

if (!realpath(path, resolved_path)) {  /* Note: Not threadsafe on older BSD's */
  if (use_realpath == CWD_REALPATH) {
     return 1;
  }
  goto no_realpath;
}
use_realpath = CWD_REALPATH;
CWD_STATE_COPY(&old_state, state);

Manual page says:
BUGS
       Avoid using this function.  It is broken by design...

So please avoid use this function, as stated above it has significant performance impact and as stated in manual it's simply broken.

For now I have disabled this, by undefining HAVE_REALPATH.
 [2009-06-18 14:17 UTC] rasmus@php.net
We need the realpath call to determine if there are symlinks in the path.  If /a is a symlink to /b and you do:

require '/a/file.php';
require_once '/b/file.php';

then file.php is actually the same file and the second require_once should do nothing, but we can only know that with a realpath call. 

In 5.3 we have replaced the system-level realpath call with our own implementation which does intra-path caching, so this has been addressed now, but it won't be changed in 5.2.
 [2009-06-19 19:30 UTC] sq6elt at wp dot pl
From logical point of view, these files are different.
Programmers should thread this as different files.

Besides, this is broken for hardlinks, only symlinks may be determined
by this call.

Simple test (rp.c)

#include <stdlib.h>
#include <string.h>

char x[4096];

int main(int argc, char* argv[]) {
        if (argc != 2 ) {
                printf("Provide a path %i\n", argc);
                return 0;
        }
        memset(x, 0, 4096);
        realpath(argv[1], x);
        printf("Realpath for %s is %s\n", argv[1], x);
}

$ ./rp /home/ftp/welcome.msg 
Realpath for /home/ftp/welcome.msg is /home/ftp/welcome.msg

$ mkdir /home/t
$ mount /home/ftp /home/t --bind
$ ./rp /home/t/welcome.msg 
Realpath for /home/t/welcome.msg is /home/t/welcome.msg

So, this is partial solution, and in some environment 
have big performance impacts.

If its really required to know if two files are really 
the same file, look at stat.
$ stat /home/ftp/welcome.msg 
  File: `/home/ftp/welcome.msg'
  Size: 166             Blocks: 8          IO Block: 4096   regular file
Device: 807h/2055d      Inode: 67117468    Links: 1
[...]

$ stat /home/t/welcome.msg 
  File: `/home/t/welcome.msg'
  Size: 166             Blocks: 8          IO Block: 4096   regular file
Device: 807h/2055d      Inode: 67117468    Links: 1
[...]

One may determine if two files are really the same by comparing
device and inode. 
This compare do a single stat on target file and not stats on each path element, and solves hard linking.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Oct 31 23:01:28 2024 UTC