php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #64291 Indeterminate GC of evaled lambda function resources
Submitted: 2013-02-24 14:24 UTC Modified: 2021-11-12 13:57 UTC
Votes:9
Avg. Score:4.9 ± 0.3
Reproduced:9 of 9 (100.0%)
Same Version:1 (11.1%)
Same OS:1 (11.1%)
From: Terry at ellisons dot org dot uk Assigned:
Status: Duplicate Package: Scripting Engine problem
PHP Version: 5.4.12 OS: Ubuntu 12.10
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: Terry at ellisons dot org dot uk
New email:
PHP Version: OS:

 

 [2013-02-24 14:24 UTC] Terry at ellisons dot org dot uk
Description:
------------
The Internals thread "(non)growing memory while creating anoymous functions via eval()" see http://marc.info/?t=135990541000003&r=1&w=2 is the background to this report.

Description

Storage allocation and garbage collection of lambda function resources is indeterminate an may or may not lead to memory exhaustion.  This (Example1) script demonstrates the effect:

    <?php
    $x = "";
    while (1) {
        if (isset($argv[1])) $x = str_repeat(" " ,mt_rand(1,50000));
        eval ("\$fun = function() { $x return memory_get_usage(); };");
        echo "Mem usage= {$fun()}\n";
    }

If arg1 is set it dies with memory exhaustion, and is stable if unset.  Replacing the eval with (giving Example2)

    $fun = function() { return memory_get_usage(); };

is also stable so this isn't a resource leakage in the string $x per se.

The issue here is that the closure creates a magic name for the function, in PHP terms

    $function_name = sprintf("\0{closure}%s%p",name,addr);

where the name is the resolved filename of the source where the function was defined and the addr is the absolute address of the function definition within the memory resident copy of the source during the compilation process, for example in Example 2 where $fun is statically defined, on my test this is

    "\0{closure}/tmp/y.php0x7fc90b1fd083"

The compiler creates one entry for "\0{closure}/tmp/y.php0x7fc90b1fd08" in the CG function_table, but the closure DTOR does not delete or GC this entry.  The reason is in this logical: in Example 2 the $fun assignment generates

     ZEND_DECLARE_LAMBDA_FUNCTION    '\0{closure}/tmp/y.php0x7fc90b1fd083'
     ASSIGN                           !2, ~5

that is the function is compiled once but rebound multiple times during execution.

With Example 1, the where the eval is used, the closure uses a name based on the source file and line number where the eval was executed, e.g. "/tmp/y.php(5): eval()'d code" giving a magic name for the function of 

    "\0{closure}/tmp/y.php(5): eval()'d code0x0x7fa11ccb9ef7"

where the addr is the absolute memory location in the string being evaluated.

Hence in this scenario, each evaluation creates a new entry in the function_table, even though the closure is subsequently DTORed. In many ways this is the same behaviour are similar to create_function which generates magic names "\0Lambda%d" where the integer is the # of the lamda generated and again these build up in the function_table and are not GC'ed.

The interesting Q is why isn't this always the behaviour? The reason is that the allocator includes an optimisation whereby if a string with an RC=1 is being replaced by a string of the same size then the memory is reused. If the new string contains a new closure function starting at the same offset then by accident the magic name will be the same as the previous (and different function). The compilation invokes the function zend_do_begin_function_declaration() and here the !is_method path does a zend_hash_update on the CG function_table. As the names happen to be the same, the update executes the function table DTOR on the previous entry cleaning it up.

This accidental cleanup seems like a bug.  I'll try to find an exploitable example.


Patches

Add_sequence_count_to_build_runtime_defined_function_key (last revision 2013-11-21 14:59 UTC by Terry at ellisons dot org dot uk)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2013-11-20 15:56 UTC] Terry at ellisons dot org dot uk
This bug is cascading to another: https://bugs.php.net/bug.php?id=65915 -- consider this test script:

<?php
$tmp = '/tmp/testNg3hUy';
foreach (['a','b'] as $f) {
  file_put_contents($tmp, '<?php return function(){ return "'.$f.'"; };');
  echo file_get_contents($tmp), "\n";
  $$f = require $tmp;
}
printf( "%s, %s\n ", $a(), $b());
unlink($tmp);

The require assignment should generate the equivalent of 

$a = function(){ return "a"; };
$a = function(){ return "b"; };

so that the printf should generate 

a, b

However it actually generates

a, a

demonstrating that the "\0{closure}%s%p" naming convention does not generate unique closure names.
 [2013-11-20 15:58 UTC] Terry at ellisons dot org dot uk
Sorry the second line above should read:

$b = function(){ return "b"; };
 [2013-11-21 15:07 UTC] Terry at ellisons dot org dot uk
The issue here is that the runtime defined function keys should be unique, and the current algo does not generate unique keys.  There are many approaches that could be taken to remove such incorrect name clashes.  The patch that I've submitted add a sequence count to the key -- simple but sufficient to prevent this bug.
 [2013-11-22 12:34 UTC] Terry at ellisons dot org dot uk
Just to note that my patch is not strong enough when used with OPcache, since a simple static running count can fail when the interpreter is forked.  Better alternatives include 

* some form of unique id, e.g. a UUID or uniquid(true)
* a content based hash, such as md5file(__FILE__) -- though if this were to be adopted then it would be better always to generate this as part of the compile creating say __FILE_MD5__, though this would add a few % to the compile time.

This all needs wider discussion of these issues on the Internals ML.
 [2013-11-25 08:53 UTC] dmitry@php.net
Your second script prints "a, a" only with OPCache, because it caches the included temporary file. Without OPCache it prints the expected "a, b".

I also don't think that the first script indicates a bug. The more functions you create the more memory it requires.
 [2013-11-25 12:42 UTC] Terry at ellisons dot org dot uk
What threw me was the botch with the temporary entries "\0{closure}$filenane$offset" are used in the EG(function_table).  This is as clear as mud.  When a file is compiled, a function table entry is created for each closue in the source.  This entry is never executed directly, but is used by the ZEND_DECLARE_LAMBDA_FUNCTION to construct the closure object which contains a deep copy of this zend_function record.  It is this copy that used when the closure is called.  So long as the ZEND_DECLARE_LAMBDA_FUNCTION
are executed within the same scope as the compile, this should normally be unique, but it is quite easy to construct a test case which the unique assumption fails: 

--TEST--
ISSUE #65915 Temporary function entries for closures are not unique
--INI--
opcache.enable=0
--SKIPIF--
--FILE--
<?php
$tmp = tempnam(__DIR__, 'test');
foreach (['a','b'] as $f) {
  file_put_contents($tmp, "<?php function $f() {return function(){ return '$f'; };}");
  echo file_get_contents($tmp), "\n";
  require $tmp;
}
$a = a();
$b = b();
printf( "%s, %s\n ", $a(), $b());
unlink($tmp);
?>
--CLEAN--	
--EXPECT--
<?php function a() {return function(){ return 'a'; };}
<?php function b() {return function(){ return 'b'; };}
a, b
 [2013-11-25 13:24 UTC] Terry at ellisons dot org dot uk
And here's the eval version:

--TEST--
ISSUE #64291 Temporary function entries for closures are not unique
--INI--
opcache.enable=1
--FILE--
<?php
foreach (['a','b'] as $f) {
  $tmp = "function $f() {return function(){ return '$f'; };}\n";
  eval($tmp);
  echo $tmp;
}
$a = a();
$b = b();
printf( "%s, %s\n ", $a(), $b());
?>
--EXPECT--
function a() {return function(){ return 'a'; };}
function b() {return function(){ return 'b'; };}
a, b
 [2013-11-25 13:26 UTC] Terry at ellisons dot org dot uk
Sorry, the above should read opcache.enable=0 though the failure is the same for opcache enabled and not enabled in the eval case.
 [2013-12-05 16:15 UTC] Terry at ellisons dot org dot uk
Dmitry, 

I have just realised that this "mangled names should be unique" issue applied to any runtime bound function or class as the following -- albeit perverse example shows:

--TEST--
ISSUE #65915A Temporary class entries are not unique
--INI--
opcache.enable=0
--SKIPIF--
--FILE--
<?php
$tmp = tempnam(__DIR__, 'test');
foreach (['a','b'] as $f) {
  file_put_contents($tmp, <<<END
<?php 
  function $f() {
    class Hello { const WORLD = "Hello world from $f\\n"; }
  }
END
);
  require $tmp;
}
a();
echo Hello::WORLD;
unlink($tmp);
?>
--CLEAN--	
--EXPECT--
Hello world from a

Here the two functions both compile a class with a mangled name "\0$class$filename$string_addr" which is the same for the a() and b() copies
so b() version overwrites the a() one, and the DECLARE_CLASS opcode in a() incorrecly binds to the wrong class, hence Hello::WORLD incorrectly prints out the "from b" version.

However, I suspect in practice that this is unlikely to manifest itself in real word apps.
 [2014-03-17 18:58 UTC] ulrich dot eckhardt at base-42 dot de
I'm getting bitten by this bug, too. My use case is basically that I'm generating and eval()'ing code. Since this is not in a one-shot HTTP server process environment but in a longer running process that caches the generated code for performance reasons, this causes various failures that depend on the order in which requests are made.

Using Terry's patch on PHP 5.4.24 on Debian/AMD64 caused the problem to go away, my services now run stable. What's keeping you from applying this patch (apart from the fact that it mixes tabs/spaces)?
 [2021-11-12 13:57 UTC] nikic@php.net
-Status: Open +Status: Duplicate
 [2021-11-12 13:57 UTC] nikic@php.net
Closing this is a duplicate of bug #76982. PHP 8.1 mostly (but not completely) fixes the leak, and the RTD key collision issue has already been addressed previously (in some 7.x version I believe).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 14 16:01:26 2024 UTC