|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2010-06-24 17:36 UTC] askalski at gmail dot com
Description:
------------
zend_bailout() will deadlock APC (and consequently the entire web server) if it longjmp's out of an APC critical section while a lock is held.
The two "surprise" bailouts that the APC code is not careful to consider:
- max_execution_time: This makes a longjmp out of a signal handler, to the nearest zend_catch {} block.
- memory_limit: This will jump out of a failed Zend allocation.
My previous workaround was to patch zend_bailout and the HANDLE_BLOCK_INTERRUPTIONS and HANDLE_UNBLOCK_INTERRUPTIONS macros, to defer the longjmp until HANDLE_UNBLOCK_INTERRUPTIONS is called. Not completely foolproof, but it worked well enough with APC 3.0.19.
However, because APC 3.1.x moves the old_compile_file() call to within a HANDLE_BLOCK_INTERRUPTIONS block, my workaround is no longer viable. The common case that breaks it is when the compile fails on parse error -- if that zend_bailout is not handled immediately, old_compile_file returns SUCCESS, and APC caches a half-baked opcode array.
Because of how max_execution_time works, there is no way to predict when a zend_bailout will happen. Therefore, the only safe solution I can think of is to wrap all APC critical sections in zend_try/zend_catch.
I originally reported this bug against PHP (46025) two years ago, but now I believe the correct fix is within the APC code.
Reproduce code:
---------------
<?php
// Original test system is PHP + Apache2 prefork
// Request this script repeatedly until it deadlocks
set_time_limit(1);
for ($i = 0; $i < 10000000; $i++) {
apc_store('my_key0', 'value0', 5);
apc_store('my_key1', 'value1', 5);
}
?>
Expected result:
----------------
APC gracefully handles the zend_bailout, restores the shared memory segment to a consistent state, and releases all locks, before passing the error up the stack.
Actual result:
--------------
APC deadlocks the web server.
PatchesAPC-3.1.9-bailout_deadlock.patch (last revision 2012-07-03 17:58 UTC by askalski at gmail dot com)Pull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Tue Dec 02 08:00:01 2025 UTC |
I was able to confirm that the proposed workaround (exit_on_timeout) leads to corruption. First, I patched sapi_child_terminate support into apache2handler (it just does a kill(getpid(), AP_SIG_GRACEFUL_STOP);) Then I built APC with --disable-apc-pthreadmutex, to make it use fcntl locking instead. Here are some error_log entries that I got when alternating between my "crash.php" script (apc_store/apc_delete until timeout), and refreshing the "User Cache Entries" tab in the apc.php dashboard script. Andy Skalski [Fri Jun 25 02:03:39 2010] [notice] Apache/2.2.14 (Ubuntu) PHP/5.3.2 configured -- resuming normal operations /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded [Fri Jun 25 02:04:09 2010] [notice] child pid 29057 exit signal Segmentation fault (11) [Fri Jun 25 02:04:09 2010] [notice] child pid 29067 exit signal Segmentation fault (11) [Fri Jun 25 02:04:42 2010] [notice] Apache/2.2.14 (Ubuntu) PHP/5.3.2 configured -- resuming normal operations /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded /home/askalski/public_html/crash.php(17) : Fatal error - Maximum execution time of 1 second exceeded [Fri Jun 25 02:05:00 2010] Script: '/home/askalski/public_html/apc.php' /home/askalski/work/APC-3.1.3p1/apc_zend.c(38) : Freeing 0x7FAA04C8FD10 (4 bytes), script=/home/askalski/public_html/apc.php Last leak repeated 11 times === Total 12 memory leaks detected === <?php /* crash.php */ // I used apc.shm_size=200 for this test. $a = array(); for ($i = 1; $i <= 1000; $i++) { $a[] = str_pad('', $i); } set_time_limit(1); while (1) { shuffle($a); for ($i = 0; $i < 1000; $i++) { apc_store("my_key.$i", $a, 5); } foreach ($a as $k => $v) { apc_delete("my_key.$k"); } } ?>In response to an email request, I attached a patch against APC 3.1.9 (originally a 3.1.6 patch, which I updated to apply cleanly. I have not yet tested it against 3.1.9.) APC-3.1.9-bailout_deadlock.patch This works in conjunction with the PHP-side patches to HANDLE_BLOCK_INTERRUPTIONS / HANDLE_UNBLOCK_INTERRUPTIONS on bug #46025. The APC patch does the following: - Temporarily unblocks interruptions during the call to old_compile_file. This ensures that parse errors correctly trigger a bailout. - Adds missing HANDLE_UNBLOCK_INTERRUPTIONS calls to the error return cases of _apc_store and apc_compile_file.