php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81017 PharData memory leak
Submitted: 2021-05-06 18:57 UTC Modified: 2021-05-10 14:36 UTC
Votes:3
Avg. Score:3.7 ± 0.9
Reproduced:3 of 3 (100.0%)
Same Version:1 (33.3%)
Same OS:1 (33.3%)
From: jon dot johnson at ucsf dot edu Assigned:
Status: Open Package: PHAR related
PHP Version: 7.4 OS: PHP Docker / OSX 10.15.7
Private report: No CVE-ID: None
 [2021-05-06 18:57 UTC] jon dot johnson at ucsf dot edu
Description:
------------
When working with a tar file using PharData memory increases and is not released. I first noticed this when using PharData::extractTo(), but it is more easily reproduced with PharData::addEmptyDir() as I've done below. It seems to be relative to to the size fo the archive, but I didn't confirm this.

Seems to exist in all versions of PHP, tested with:
docker container run --rm -v $(pwd):/test/ php:5-cli php /test/test.php
docker container run --rm -v $(pwd):/test/ php:7-cli php /test/test.php
docker container run --rm -v $(pwd):/test/ php:8-cli php /test/test.php

and got the same result.

Test script:
---------------
<?php

echo 'Start: ' . memory_get_usage() . "\n\n";
for ($i = 0; $i < 10; $i++) {

    $path = __DIR__ . DIRECTORY_SEPARATOR . $i . '.tar';
    $phar = new PharData($path);
    $phar->addEmptyDir('test');

    unset($phar);
    unlink($path);
    echo "After ${i}: " . memory_get_usage() . "\n";
}
gc_collect_cycles();

echo "\nEnd: " . memory_get_usage() . "\n";


Expected result:
----------------
I would expect each iteration fo the loop to be self contained (even without the manual steps to unset and unlink) and that memory consumption would be constant for this script no matter how many iterations were run.

Actual result:
--------------
php test.php
Start: 398248

After 0: 411936
After 1: 424968
After 2: 438000
After 3: 451032
After 4: 464064
After 5: 477096
After 6: 490128
After 7: 503160
After 8: 516512
After 9: 529544

End: 529504

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-05-06 23:24 UTC] hanskrentel at yahoo dot de
If the filename stays the same per each iteration, there is no further increase and PHP exits with status 137.

PHP also exists with status 137 when the directory entry is deleted right after it was added. Memory stays low then, too. 

The behaviour is since the existence of PharData.

100 runs each, across more php versions with summaries at the bottom:

* 100 file names: https://3v4l.org/lGa2n
* 1 file name: https://3v4l.org/8GaSb (Exit status 137)
* 100 file names, directory entry deleted: https://3v4l.org/OqGCY (Exit status 137)

most common memory offset per file (~94 of 100 times):

PHP 8 (all versions): 12928
PHP 7 (all versions): 12960
PHP 5 (5.4 - 5.6)   :  3032
PHP 5.3             :  3192
 [2021-05-07 11:41 UTC] cmb@php.net
-Status: Open +Status: Not a bug -Assigned To: +Assigned To: cmb
 [2021-05-07 11:41 UTC] cmb@php.net
The memory "leak" is the phar_fname_map[1] which holds one entry
per PharData used during the request, and is only released at the
end of the request.  So there is no memory leak; it's just a
caching mechanism.

> PHP exits with status 137.

That's due to the infinite loop in Metrics::summary() (which looks
like a userland bug) hitting a timeout.

[1] <https://github.com/php/php-src/blob/php-7.4.19/ext/phar/phar_internal.h#L133>
 [2021-05-07 17:30 UTC] jon dot johnson at ucsf dot edu
Changing the size of the tar file increases the amount of memory used. I would expect a map to grow constantly with each entry not scale depending on the file size otherwise there should be a way to clear this map when working with multiple files.

Adding an inner loop to add more to each file in my example will increase the memory consumed.

for ($j = 0; $j < 10; $j++) {
    $phar->addFromString("test-file-${j}.php", file_get_contents(__FILE__));
}
 [2021-05-07 17:55 UTC] cmb@php.net
-Status: Not a bug +Status: Open
 [2021-05-07 17:55 UTC] cmb@php.net
I shall have a closer look.  Thanks!
 [2021-05-10 14:36 UTC] cmb@php.net
-Status: Assigned +Status: Open -PHP Version: 8.0.5 +PHP Version: 7.4 -Assigned To: cmb +Assigned To:
 [2021-05-10 14:36 UTC] cmb@php.net
> Changing the size of the tar file increases the amount of memory
> used.

Indeed, since the phar_fname_map also holds the manifest of the
archive, which has an entry for each file contained in the
archive.

> […]  otherwise there should be a way to clear this map when
> working with multiple files.

I was going to suggest to use PharData::unlinkArchive() as
woraround, but that doesn't work (at least on Windows).  I think
there is a bug report about the underlying issue, but I can't find
it.

However, I found bug #75101, which closely related to this ticket.
The caching in phar_fname_map might not be the best idea.

Anyhow, I personally would not use Phar*Data* at all.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 02 23:01:29 2024 UTC