php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #75101 `PharData` caches file contents, even through file deletion
Submitted: 2017-08-21 09:29 UTC Modified: -
Votes:9
Avg. Score:4.3 ± 0.7
Reproduced:9 of 9 (100.0%)
Same Version:3 (33.3%)
Same OS:1 (11.1%)
From: d28b312d at opayq dot com Assigned:
Status: Open Package: PHAR related
PHP Version: Irrelevant OS: Windows 10 + Cygwin
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2017-08-21 09:29 UTC] d28b312d at opayq dot com
Description:
------------
 - PHP shipped with Cygwin (7.0.19), can't find anything in the patch notes to suggest this is fixed in newer versions.

`PharData` seems to cache file contents even when the file is no longer on the disk. Constructing a new `PharData` with the same filename will produce the same object as before. I understand there is an alias parameter, but this seems like totally unexpected behaviour especially as these functions seem to be the go-to functions for extracting tar archives these days.

Looking in the source code I see that this seems to be partially deliberate behaviour as `phar_open_parsed_phar` tries to open an already existing phar file, however, if the file has been deleted off disk and a new `PharData` object is created then surely new data should be loaded? Also does this mean that once a PharData object is created and a tar file opened that the entire contents of the tar is held in memory for the duration of the PHP script?

There seems to be no way to remove this caching, as shown in the test script the tgz file is removed and then opened again with a new file (with different contents), and yet the same file is opened somehow.

I've tried calling `unset()` on the `PharData` object before creating a new one or calling `clearstatcache()`, but to no avail.

Interestingly and of note, even though the default format is `Phar::TAR`, it seems that it auto-detects TGZ files and decompresses them fine.

Test script:
---------------
<?php

$url = 'http://thrysoee.dk/editline/libedit-%s-3.1.tar.gz';
$downloadFile = '/tmp/foo.tgz';

file_put_contents($downloadFile, fopen(sprintf($url, '20170329'), 'r'));
var_dump(new PharData($downloadFile));

unlink($downloadFile);

file_put_contents($downloadFile, fopen(sprintf($url, '20160903'), 'r'));
var_dump(new PharData($downloadFile));

Expected result:
----------------
I expect the second object to be the new file, as shown in the edited output below:

/tmp $ php /tmp/phpbugtest.php
object(PharData)#1 (4) {
  ["pathName":"SplFileInfo":private]=>
  string(40) "phar:///tmp/foo.tgz/libedit-20170329-3.1"
  ["fileName":"SplFileInfo":private]=>
  string(20) "libedit-20170329-3.1"
  ["glob":"DirectoryIterator":private]=>
  bool(false)
  ["subPathName":"RecursiveDirectoryIterator":private]=>
  string(0) ""
}
object(PharData)#2 (4) {
  ["pathName":"SplFileInfo":private]=>
  string(40) "phar:///tmp/foo.tgz/libedit-20160603-3.1"
  ["fileName":"SplFileInfo":private]=>
  string(20) "libedit-20160603-3.1"
  ["glob":"DirectoryIterator":private]=>
  bool(false)
  ["subPathName":"RecursiveDirectoryIterator":private]=>
  string(0) ""
}

Actual result:
--------------
/tmp $ php /tmp/phpbugtest.php
object(PharData)#1 (4) {
  ["pathName":"SplFileInfo":private]=>
  string(40) "phar:///tmp/foo.tgz/libedit-20170329-3.1"
  ["fileName":"SplFileInfo":private]=>
  string(20) "libedit-20170329-3.1"
  ["glob":"DirectoryIterator":private]=>
  bool(false)
  ["subPathName":"RecursiveDirectoryIterator":private]=>
  string(0) ""
}
object(PharData)#1 (4) {
  ["pathName":"SplFileInfo":private]=>
  string(40) "phar:///tmp/foo.tgz/libedit-20170329-3.1"
  ["fileName":"SplFileInfo":private]=>
  string(20) "libedit-20170329-3.1"
  ["glob":"DirectoryIterator":private]=>
  bool(false)
  ["subPathName":"RecursiveDirectoryIterator":private]=>
  string(0) ""
}

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2017-11-26 03:39 UTC] zeitgeist at ukr dot net
And in addition to this.

I use PharData to decompress downloaded .tar.gz files in infinite loop.

<?php

while (true)
{
$storagePrefix = '/home/myusername/storage'; // in my script this is another absolute file path
echo "Downloading...\n";
file_put_contents($storagePrefix.'/export_xml.tar.gz', file_get_contents('http://localhost/export/1402/630/export_xml.tar.gz')); // real url goes here...

echo "Decompressing...\n";
$p = new \PharData($storagePrefix.'/export_xml.tar.gz');
$p->decompress();
$this->comment('Extracting...');
$p = new \PharData($storagePrefix.'/export_xml.tar');
$p->extractTo($storagePrefix.'/export_xml');

// parsing logic goes here...

echo "Cleaning out...";
File::deleteDirectory($storagePrefix.'/export_xml');
File::delete($storagePrefix.'/export_xml.tar.gz');
File::delete($storagePrefix.'/export_xml.tar');
echo "Done.\n";

sleep(3);
}

/*
// lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.2 LTS
Release:        16.04
Codename:       xenial


PHP 7.1.4-1+deb.sury.org~xenial+1 (cli) (built: Apr 11 2017 22:12:32) ( NTS )
Copyright (c) 1997-2017 The PHP Group
Zend Engine v3.1.0, Copyright (c) 1998-2017 Zend Technologies
    with Zend OPcache v7.1.4-1+deb.sury.org~xenial+1, Copyright (c) 1999-2017, by Zend Technologies
*/

?>

On first loop cycle everything is ok - I get decompressed files in export_xml folder. Then I delete them in the end of first loop cycle.
But during the second loop cycle I see the following error:

 [BadMethodCallException]
  Unable to add newly converted phar "/home/myusername/storage/export_xml.tar" to the list of phars, a phar with that name already exists

This error related to string "$p->decompress();" . Looks like PharData think that export_xml.tar file exists at that moment, but the file does not.


I expect that PharData could successfully decompress newly downloded .tar.gz file, but it doesn't happen...
 [2018-04-20 08:56 UTC] inuyasha dot smith at dist dot info
I can confirm this bug. In my case I'm replacing a tar.gz file I previously opened and reopening it gives me the same output. Quick fix would be to copy the file to a temporary location so it has a different name and doesn't get cached.
 
PHP Copyright © 2001-2020 The PHP Group
All rights reserved.
Last updated: Thu Apr 02 22:01:24 2020 UTC