php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76895 PHP-FPM7.x can't finish reloading after encountering a fatal error
Submitted: 2018-09-17 08:48 UTC Modified: 2020-01-12 16:45 UTC
Votes:9
Avg. Score:4.4 ± 0.7
Reproduced:9 of 9 (100.0%)
Same Version:7 (77.8%)
Same OS:4 (44.4%)
From: florisjan dot gerrits at gmail dot com Assigned: bukka (profile)
Status: Closed Package: FPM related
PHP Version: 7.2.10 OS: Ubuntu 18.04 bionic
Private report: No CVE-ID: None
 [2018-09-17 08:48 UTC] florisjan dot gerrits at gmail dot com
Description:
------------
What you did: Browsed to a website using break/continue outside a loop and afterwards, I reloaded PHP-FPM.
What I want to happen: I want PHP-FPM to successfully reload
What actually happened: PHP-FPM got stuck in 'Waiting on reload...' and forced me to either restart or manually send a SIGKILL to the child that got stuck.

(Force) Moving websites from PHP5.6 to PHP7.x made me to run into this bug. A lot of old websites use 'break' and 'continue' options in if-constructions (outside loops). An example:

2018/09/13 11:25:59 [error] 7861#7861: *8603 FastCGI sent in stderr: "PHP message: PHP Fatal error:  'break' not in the 'loop' or 'switch' context in /home/florisjantest.nl/public_html/index.php on line 5" while reading response header from upstream, client: 0.0.0.0, server: florisjantest.nl, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.florisjantest.nl.socket:", host: "florisjantest.nl"

This Fatal error triggers the PHP-FPM child to become unresponsive and ignore any and all signals except for SIGKILL. Trying to reload PHP-FPM will cause the master to get stuck in 'waiting on reload' status and will not allow any new connections. 


Technical:

What you did: trigger the error and reload PHP-FPM
What I want to happen:

[pid  6324] kill(7893, SIGQUIT)         = 0
[pid  6324] getpid()                    = 6324
[pid  6324] getpid()                    = 6324
[pid  6324] epoll_wait(8,  <unfinished ...>
[pid  7893] <... accept resumed> 0x7ffd46708d70, [112]) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid  7893] --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=6324, si_uid=0} ---
[pid  7893] close(0)                    = 0
[pid  7893] socket(AF_UNIX, SOCK_STREAM, 0) = 0
[pid  7893] rt_sigreturn({mask=[]})     = 43
[pid  7893] accept(0, 0x7ffd46708d70, [112]) = -1 EINVAL (Invalid argument)
[pid  7893] munmap(0x7ffab1d13000, 67108864) = 0
[pid  7893] close(3)                   

What actually happened:

[pid 22815] kill(8176, SIGQUIT)         = 0
[pid 22815] getpid()                    = 22815
[pid 22815] getpid()                    = 22815
[pid 22815] epoll_wait(9,  <unfinished ...>
[pid  8176] <... accept resumed> 0x7ffd81ca33d0, [112]) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid  8176] --- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=22815, si_uid=0} ---
[pid  8176] rt_sigreturn({mask=[]})     = -1 EINTR (Interrupted system call)
[pid  8176] accept(10,  <unfinished ...>
[pid 22815] <... epoll_wait resumed> [], 1, 1000) = 0
[pid 22815] getpid()                    = 22815




Loaded modules:
# php-fpm7.2 -m
[PHP Modules]
bcmath
bz2
calendar
cgi-fcgi
Core
ctype
curl
date
dom
exif
fileinfo
filter
ftp
gd
gettext
hash
iconv
imap
intl
json
ldap
libxml
mbstring
mysqli
mysqlnd
openssl
pcre
PDO
pdo_mysql
pdo_pgsql
pdo_sqlite
pgsql
Phar
posix
readline
Reflection
session
shmop
SimpleXML
soap
sockets
sodium
SPL
sqlite3
standard
sysvmsg
sysvsem
sysvshm
tokenizer
wddx
xml
xmlreader
xmlwriter
xsl
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache

Test script:
---------------
<?php
$variabele = 'TEST';

if (!empty($variabele)) {
  break;
}

Actual result:
--------------
After requesting the above mentioned script the following situation will occur:

The master process issues a SIGQUIT every 10 seconds:

epoll_wait(9, [], 1, 1000)              = 0
getpid()                                = 22080
kill(22106, SIGQUIT)                    = 0
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 1000)              = 0
getpid()                                = 22080
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 992)               = 0
getpid()                                = 22080
socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 10
getsockopt(10, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
setsockopt(10, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = 0
getuid()                                = 0
geteuid()                               = 0
getgid()                                = 0
getegid()                               = 0
sendmsg(10, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=21, msg_iov=[{iov_base="READY=1\nSTATUS=Processes active: 0, idle: 1, Requests: 1, slow: 0, Traffic: 0req/sec", iov_len=84}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 84

An strace on the child process shows the following:

accept(10, 0x7fff77012e90, [112])       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=22080, si_uid=0} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
accept(10,



The situation changes when you initiate a reload of PHP-FPM. The reload will signal SIGUSR2, it sends another SIGQUIT to the child and after 10 seconds it will send a SIGTERM. This is where no further signals are given to the child. No new SIGQUITs no new SIGTERMs and no SIGKILLs (although that's probably okay). The broken child is still active and upholding the master from reloading:

# systemctl reload php7.2-fpm.service
# ps axuf | grep php
root     22697  0.0  0.2  14432  1020 pts/0    S+   11:40   0:00          \_ grep php
root     22080  0.0  6.1 486688 30532 ?        Ss   11:25   0:00 php-fpm: master process (/etc/php/7.2/fpm/php-fpm.conf)
florisja 22106  0.0  3.8 489292 18984 ?        S    11:25   0:00  \_ php-fpm: pool florisjantest.nl


Reload in master: 

epoll_wait(9, 0x55fbc277db10, 1, 1000)  = -1 EINTR (Interrupted system call)
--- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_USER, si_pid=22692, si_uid=0} ---
getpid()                                = 22080
write(6, "2", 1)                        = 1
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
getpid()                                = 22080
epoll_wait(9, [{EPOLLIN, {u32=3228071840, u64=94539753224096}}], 1, 773) = 1
read(5, "2", 1)                         = 1
write(4, "[13-Sep-2018 11:40:35] NOTICE: Reloading in progress ...\n", 57) = 57
kill(22106, SIGQUIT)                    = 0
read(5, 0x7fff77012e77, 1)              = -1 EAGAIN (Resource temporarily unavailable)
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 772)               = 0
getpid()                                = 22080
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 1000)              = 0
getpid()                                = 22080
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 1000)              = 0
getpid()                                = 22080
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 795)               = 0
getpid()                                = 22080
socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 10
getsockopt(10, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
setsockopt(10, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = 0
getuid()                                = 0
geteuid()                               = 0
getgid()                                = 0
getegid()                               = 0
sendmsg(10, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=21, msg_iov=[{iov_base="READY=1\nSTATUS=Processes active: 0, idle: 2, Requests: 1, slow: 0, Traffic: 0req/sec", iov_len=84}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 84
close(10)                               = 0
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 202)               = 0
getpid()                                = 22080
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 1000)              = 0
getpid()                                = 22080
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 222)               = 0
kill(22106, SIGTERM)                    = 0
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, [], 1, 777)               = 0

Reload in child:

--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=22080, si_uid=0} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
accept(10, 0x7fff77012e90, [112])       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=22080, si_uid=0} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
accept(10,


As mentioned above, no new SIGQUIT signals will follow. The master gets the status 'Waiting on reload...' and doesn't seem to think the child is still there.

Initiating a restart of PHP-FPM will show the following:

# systemctl restart php7.2-fpm.service
# ps axuf
root     22815  0.0  6.1 486688 30352 ?        Ss   11:47   0:00 php-fpm: master process (/etc/php/7.2/fpm/php-fpm.conf)



Master: 

--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---
getpid()                                = 22080
write(6, "T", 1)                        = 1
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=1, si_uid=0} ---
getpid()                                = 22080
epoll_wait(9, [{EPOLLIN, {u32=3228071840, u64=94539753224096}}], 1, 968) = 1
read(5, "T", 1)                         = 1
write(4, "[13-Sep-2018 11:47:54] NOTICE: Terminating ...\n", 47) = 47
kill(22106, SIGTERM)                    = 0
read(5, 0x7fff77012e77, 1)              = -1 EAGAIN (Resource temporarily unavailable)
......
kill(22106, SIGKILL)                    = 0
getpid()                                = 22080
getpid()                                = 22080
epoll_wait(9, 0x55fbc277db10, 1, 968)   = -1 EINTR (Interrupted system call)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=22106, si_uid=1003, si_status=SIGKILL, si_utime=0, si_stime=1} ---
getpid()                                = 22080
write(6, "C", 1)                        = 1
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
getpid()                                = 22080
epoll_wait(9, [{EPOLLIN, {u32=3228071840, u64=94539753224096}}], 1, 963) = 1
read(5, "C", 1)                         = 1
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WNOHANG|WSTOPPED, NULL) = 22106
write(4, "[13-Sep-2018 11:47:59] NOTICE: exiting, bye-bye!\n", 49) = 49


Child:

0x7fff77012e90, [112])       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=1, si_uid=0} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
accept(10, 0x7fff77012e90, [112])       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=22080, si_uid=0} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
accept(10,  <unfinished ...>)           = ?
+++ killed by SIGKILL +++

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-09-17 09:10 UTC] florisjan dot gerrits at gmail dot com
Note: I tested this bug in:
PHP-FPM7.0, 7.1 and 7.2 

on a variety of distributions:

Distribution: Debian 9 stretch 
Kernel:  Debian 3.16.51-3+deb8u1 (2018-01-08) x86_64 GNU/Linux

Distribution: Ubuntu 18.04 bionic
Kernel: 4.15.0-33-generic

Distribution: CentOS 7 Core
Kernel: 3.10.0-862.11.6.el7.x86_64

Distribution: Ubuntu 14.04.5 LTS
Kernel: 4.4.0-134-generic

Not sure if that adds anything to the story. But I just wanted to make sure that this is reproducible on all PHP-FPM7.x versions and all distributions I had at hand.
 [2018-10-16 11:03 UTC] mnikulin at plesk dot com
Apparent negative consequences of this bug can be mitigated
by the patches for
Bug #74083 	master PHP-fpm is stopped on multiple reloads
and
Bug #76601 	Partially working php-fpm ater incomplete reload

https://bugs.php.net/patch-display.php?bug_id=76601&patch=php-76601_kill-not-rescheduled_7.2.11_2018-09-28.patch&revision=latest
ensures that SIGKILL is sent to the child during reload.

The root cause is that after syntax error the worker ignores
SIGQUIT and SIGTERM even though zend_signal_handler_defer()
is invoked in the child process.

I suspect that some handler is not restored after
call to zend_error_noreturn() in
Zend/zend_compile.c:4489

    void zend_compile_break_continue(zend_ast *ast)

or signals are postponed but not processed.

To demonstrate that master process send proper signals during reload,
log verbosity may be increased to debug level in php-fpm.conf

[16-Oct-2018 17:25:15.241230] DEBUG: pid 8247, fpm_pctl(), line 231: switching to 'reloading' state
[16-Oct-2018 17:25:15.241245] DEBUG: pid 8247, fpm_pctl_kill_all(), line 159: [pool example] sending signal 3 SIGQUIT to child 8273
[16-Oct-2018 17:25:15.241252] DEBUG: pid 8247, fpm_pctl_kill_all(), line 168: 1 child(ren) still alive
[16-Oct-2018 17:25:15.241262] DEBUG: pid 8247, fpm_event_loop(), line 419: event module triggered 1 events
[16-Oct-2018 17:25:17.241912] DEBUG: pid 8247, fpm_pctl_kill_all(), line 159: [pool example] sending signal 15 SIGTERM to child 8273
[16-Oct-2018 17:25:17.241973] DEBUG: pid 8247, fpm_pctl_kill_all(), line 168: 1 child(ren) still alive

So zend handlers interfere with signals used for process management.
 [2019-01-23 08:20 UTC] emayoral at arsys dot es
Will not let me vote on the issue, but I am also affected. (https://bugs.php.net/bug.php?id=77443)

Patch suggested by mnikulin at plesk dot com would help , any plans to incorporate the patch at least in php 7.1 and upwards?
 [2019-02-20 04:05 UTC] mnikulin at plesk dot com
To reproduce the problem opcache.so must be loaded.

See the comment added at 2019-02-20 03:59 UTC to Bug #76601
for an unfinished attempt to write a test for this bug.
 [2019-07-23 08:14 UTC] nikic@php.net
It does look like opcache is being rather overzealous with signal blocking in one instance: https://github.com/php/php-src/blob/cef2071eb4298c50983b6a650d615857609f1fb4/ext/opcache/ZendAccelerator.c#L2031

This blocks interruptions during the compile_file call below and if that causes a bailout, I believe we will never unblock interrupts.

We have a sanity check for non-zero signal depth on shutdown in https://github.com/php/php-src/blob/cef2071eb4298c50983b6a650d615857609f1fb4/Zend/zend_signal.c#L342, but apparently the zend_signal_deactive() function is never actually used...
 [2019-09-02 11:18 UTC] mnikulin at plesk dot com
Does not patch
https://github.com/php/php-src/pull/4461
that is included in PHP-7.2.22 and PHP-7.3.9
fixes the problem? Just surprised that the bug
was not mentioned in the release notes. Are there
any complications?
 [2020-01-12 16:45 UTC] bukka@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: bukka
 [2020-01-12 16:45 UTC] bukka@php.net
This issue should be fixed in 7.4. Please comment here or create a new bug report if you still see it in 7.4.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 03 16:01:33 2024 UTC