php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #59281 apc zend_bailout deadlock
Submitted: 2010-06-24 17:36 UTC Modified: 2016-11-18 21:52 UTC
Votes:39
Avg. Score:4.6 ± 0.9
Reproduced:24 of 26 (92.3%)
Same Version:20 (83.3%)
Same OS:22 (91.7%)
From: askalski at gmail dot com Assigned:
Status: Wont fix Package: APC (PECL)
PHP Version: 5.3.2 OS: Linux
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: askalski at gmail dot com
New email:
PHP Version: OS:

 

 [2010-06-24 17:36 UTC] askalski at gmail dot com
Description:
------------
zend_bailout() will deadlock APC (and consequently the entire web server) if it longjmp's out of an APC critical section while a lock is held.

The two "surprise" bailouts that the APC code is not careful to consider:

- max_execution_time: This makes a longjmp out of a signal handler, to the nearest zend_catch {} block.

- memory_limit: This will jump out of a failed Zend allocation.

My previous workaround was to patch zend_bailout and the HANDLE_BLOCK_INTERRUPTIONS and HANDLE_UNBLOCK_INTERRUPTIONS macros, to defer the longjmp until HANDLE_UNBLOCK_INTERRUPTIONS is called.  Not completely foolproof, but it worked well enough with APC 3.0.19.

However, because APC 3.1.x moves the old_compile_file() call to within a HANDLE_BLOCK_INTERRUPTIONS block, my workaround is no longer viable.  The common case that breaks it is when the compile fails on parse error -- if that zend_bailout is not handled immediately, old_compile_file returns SUCCESS, and APC caches a half-baked opcode array.

Because of how max_execution_time works, there is no way to predict when a zend_bailout will happen.  Therefore, the only safe solution I can think of is to wrap all APC critical sections in zend_try/zend_catch.

I originally reported this bug against PHP (46025) two years ago, but now I believe the correct fix is within the APC code.


Reproduce code:
---------------
<?php
// Original test system is PHP + Apache2 prefork
// Request this script repeatedly until it deadlocks
set_time_limit(1);
for ($i = 0; $i < 10000000; $i++) {
        apc_store('my_key0', 'value0', 5);
        apc_store('my_key1', 'value1', 5);
}
?>


Expected result:
----------------
APC gracefully handles the zend_bailout, restores the shared memory segment to a consistent state, and releases all locks, before passing the error up the stack.

Actual result:
--------------
APC deadlocks the web server.

Patches

APC-3.1.9-bailout_deadlock.patch (last revision 2012-07-03 17:58 UTC by askalski at gmail dot com)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2010-06-24 21:21 UTC] rasmus@php.net
This is what the exit_on_timeout setting it for.  If you 
locking mechanism has owner death protection setting this will 
prevent a deadlock.
 [2010-06-24 22:54 UTC] askalski at gmail dot com
Rasmus,

We're still on 5.2.x in production, but I will keep the exit_on_timeout setting in mind for when we do roll out 5.3.x.  It can replace a local patch we created to solve an unrelated issue involving MySQL persistent connections.

For the APC issue, I'm concerned that although exit_on_timeout would break the deadlock, wouldn't that potentially leave the APC shared memory structure in a corrupt state?  (Particularly, if SIGPROF happens to fire during sma_allocate or sma_deallocate.)

I'll experiment a little and see if I can confirm that suspicion.

Thanks,

Andy Skalski
 [2010-06-25 02:35 UTC] askalski at gmail dot com
I was able to confirm that the proposed workaround (exit_on_timeout) leads to corruption.

First, I patched sapi_child_terminate support into apache2handler (it just does a kill(getpid(), AP_SIG_GRACEFUL_STOP);)

Then I built APC with --disable-apc-pthreadmutex, to make it use fcntl locking instead.

Here are some error_log entries that I got when alternating between my "crash.php" script (apc_store/apc_delete until timeout), and refreshing the "User Cache Entries" tab in the apc.php dashboard script.

Andy Skalski


[Fri Jun 25 02:03:39 2010] [notice] Apache/2.2.14 (Ubuntu) PHP/5.3.2 configured -- resuming normal operations
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
[Fri Jun 25 02:04:09 2010] [notice] child pid 29057 exit signal Segmentation fault (11)
[Fri Jun 25 02:04:09 2010] [notice] child pid 29067 exit signal Segmentation fault (11)


[Fri Jun 25 02:04:42 2010] [notice] Apache/2.2.14 (Ubuntu) PHP/5.3.2 configured -- resuming normal operations
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
/home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded
[Fri Jun 25 02:05:00 2010]  Script:  '/home/askalski/public_html/apc.php'
/home/askalski/work/APC-3.1.3p1/apc_zend.c(38) :  Freeing 0x7FAA04C8FD10 (4 bytes), script=/home/askalski/public_html/apc.php
Last leak repeated 11 times
=== Total 12 memory leaks detected ===


<?php /* crash.php */
// I used apc.shm_size=200 for this test.
$a = array();
for ($i = 1; $i <= 1000; $i++) {
	$a[] = str_pad('', $i);
}
set_time_limit(1);
while (1) {
	shuffle($a);
	for ($i = 0; $i < 1000; $i++) {
		apc_store("my_key.$i", $a, 5);
	}
	foreach ($a as $k => $v) {
		apc_delete("my_key.$k");
	}
}
?>
 [2010-07-14 18:44 UTC] gopalv82 at yahoo dot com
Please explain what changed in the apc-3.1.x branch which made your old patch break?

Maybe that's where I should be fixing things, to begin with.
 [2010-07-14 20:44 UTC] askalski at gmail dot com
First, I should mention that I don't think my patch is necessarily the "right way" to address the issue of dealing with interruptions safely (from the point of view of keeping the shared memory structure consistent.)  However, it does seem to be enough to keep our systems stable -- execution timeouts are relatively rare, in our case.

Here's what changed between 3.0.19 and 3.1.3p1 that breaks my patch:

In zend_main.c:my_compile_file(), the call to "old_compile_file" was moved inside the HANDLE_BLOCK_INTERRUPTIONS...HANDLE_UNBLOCK_INTERRUPTIONS code block (it gets called via apc_compile_cache_entry).

When there's a syntax error in the .php script, the parser does a zend_throw, which is normally caught by zend_compile_file.  My patch defers the action of zend_throw until HANDLE_UNBLOCK_INTERRUPTIONS, thus preventing zend_compile_file() from catching and properly handling the syntax error.  By the time HANDLE_UNBLOCK_INTERRUPTIONS is reached, the partially-built opcode array has already been stored in the APC cache.

Let me know if you need any more information.  My main development workstation is down for repairs, so all I can give at this time is this general description.

Thanks.
 [2012-07-03 18:06 UTC] askalski at gmail dot com
In response to an email request, I attached a patch against APC 3.1.9 (originally a 3.1.6 patch, which I updated to apply cleanly.  I have not yet tested it against 3.1.9.)

    APC-3.1.9-bailout_deadlock.patch

This works in conjunction with the PHP-side patches to HANDLE_BLOCK_INTERRUPTIONS / HANDLE_UNBLOCK_INTERRUPTIONS on bug #46025.

The APC patch does the following:
- Temporarily unblocks interruptions during the call to old_compile_file.  This ensures that parse errors correctly trigger a bailout.
- Adds missing HANDLE_UNBLOCK_INTERRUPTIONS calls to the error return cases of _apc_store and apc_compile_file.
 [2016-11-18 21:52 UTC] kalle@php.net
-Status: Open +Status: Wont fix
 [2016-11-18 21:52 UTC] kalle@php.net
APC is no longer supported in favor of opcache that comes bundled with PHP, if you wish to use the user cache, then look at PECL/APCu.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 15:01:29 2024 UTC