go to bug id or search bugs for
php-fpm spawns a series of workers, these workers can be gracefully stopped by enabling "process_control_timeout", which grants a grace period when SIGQUIT is received to complete the ongoing task before terminating the process.
Unfortunately, there are a series of bug which cause a freshly started daemon to needlessly hang until the timeout expires.
First problem: SA_RESTART flag causes SIGQUIT to be ignored until first request
The master process sets up signal handling in function fpm_signals_init_child(), setting the SA_RESTART flag.
The SIGQUIT handler causes a flag (in_shutdown) inside fastcgi.c, but this flag is not checked if the worker is blocked inside the accept() syscall.
This problem is solved by removing the following line from fpm_signals.c:
act.sa_flags |= SA_RESTART;
Second problem: (bug-in-a-bug) php_request_shutdown() doesn't properly restore the initial worker process state
The "First problem" occurs only before the worker executed any PHP script, because after that the call to php_request_startup() causes signals handlers to be redefined inside the Zend engine (zend_signal.c), which define them WITHOUT the SA_RESTART flag.
I believe that for consistency the dual function php_request_shutdown() should restore the signal handlers as they were before the execution of the PHP script, to "return" to the original state.
Third problem: Idle clients prevent the worker from gracefully terminate
If a worker is holding an open FCGI socket with a worker, this worker won't gracefull terminate even though it is not executing any PHP script.
This behaviour is unneeded and an idle client can be safely dropped.
If you are interested in solving the three problems above I can provide a pull request, I might need some help with the second point to avoid breaking other SAPIs while modifying the zend internal behaviour.
1) Freshly start php-fpm with some workers with process_control_timeout eg. 60s
2) Issue a SIGQUIT to the parent process when all workers are idle (and before they handled any request)
All FPM processes should terminate immediately (since they are idle)
Processes wait until process_control_timeout to terminate
Add a Patch
Add a Pull Request
I can reproduce this on versions 7.2.9 though to 7.2.11.
Running on Amazon Linux 2 and CentOS 7.
Debug output with process_control_timeout set to 60s on a silent vagrant box.:
14-Nov-2018 17:03:32.323116] DEBUG: pid 22501, fpm_pctl_kill_all(), line 159: [pool www] sending signal 3 SIGQUIT to child 28721
[14-Nov-2018 17:03:32.323118] DEBUG: pid 22501, fpm_pctl_kill_all(), line 168: 201 child(ren) still alive
[14-Nov-2018 17:03:32.323123] DEBUG: pid 22501, fpm_event_loop(), line 419: event module triggered 1 events
[14-Nov-2018 17:04:32.324348] DEBUG: pid 22501, fpm_pctl_kill_all(), line 159: [pool wss.cex.au] sending signal 15 SIGTERM to child 28525