php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #60961 Graceful Restart (USR2) isn't very graceful
Submitted: 2012-02-02 20:25 UTC Modified: 2013-02-13 15:57 UTC
Votes:32
Avg. Score:4.7 ± 0.6
Reproduced:29 of 29 (100.0%)
Same Version:9 (31.0%)
Same OS:8 (27.6%)
From: phpbugs at oops dot mooo dot com Assigned: fat
Status: Assigned Package: FPM related
PHP Version: 5.3.9 OS: Debian Squeeze
Private report: No CVE-ID:
Have you experienced this issue?
Rate the importance of this bug to you:

 [2012-02-02 20:25 UTC] phpbugs at oops dot mooo dot com
Description:
------------
I just compiled a new PHP+APC with the CVE-2012-0830 fix. It looks like all/some of the the active requests died. I had the same problem when upgrading to 5.3.8 and 5.3.9 too.

# cat /var/log/nginx/error.log
 *empty*
# cat /var/run/php-fpm.pid
2161
# ps aux | fgrep -i 2161
root      2161  0.1  0.2 123692  4520 ?        Ss   06:28   0:00 php-fpm: master process (/opt/php/etc/php-fpm.conf)
# kill -USR2 2161
# cat /var/log/nginx/error.log
2012/02/02 21:08:26 [error] 25004#0: *7381002 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XXX
2012/02/02 21:08:26 [error] 25004#0: *7381001 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XXX
2012/02/02 21:08:26 [error] 25004#0: *7372696 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XXX
2012/02/02 21:08:26 [error] 25004#0: *7381238 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XXX
2012/02/02 21:08:26 [error] 25004#0: *7374985 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XXX
2012/02/02 21:08:26 [error] 25004#0: *7369723 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XXX
2012/02/02 21:08:26 [error] 25004#0: *7360478 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XXX
2012/02/02 21:08:26 [error] 25004#0: *7371999 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XXX
2012/02/02 21:08:26 [error] 25004#0: *7375111 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: XXX
2012/02/02 21:08:26 [error] 25004#0: *7381000 readv() failed (104: Connection reset by peer) while reading upstream, client: xx.xx.xx.xx, server: XXX

This gives 502 Bad Gateway on the client.


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-05-23 13:30 UTC] fat@php.net
-Assigned To: +Assigned To: fat
 [2012-07-17 06:40 UTC] megazubr at gmail dot com
Centos 5.8, php 5.4.4 compiled from source with --enable-fpm
When doing USR2 on master pid of php-fpm daemon
1. kill child(502 Bad gatway on long_loop.php in same time)
2. restart daemon with another master pid

As i see, it's not a graceful reload.

master pid - 12788
child(long_loop.php) - 12795

strace -f -s 8000 -p 12788

[pid 12795] munmap(0x2b017edd1000, 1168) = 0
[pid 12795] munmap(0x2b017edd2000, 1168) = 0
[pid 12795] munmap(0x2b017edd3000, 1168) = 0
[pid 12795] munmap(0x2b017edb5000, 112) = 0
[pid 12795] time(NULL)                  = 1342505962
[pid 12795] dup2(1, 2)                  = 2
[pid 12795] close(4)                    = 0
[pid 12795] dup2(13, 0)                 = 0
[pid 12795] geteuid()                   = 0
[pid 12795] setgid(500)                 = 0
[pid 12795] open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 4
[pid 12795] read(4, "65536\n", 31)      = 6
[pid 12795] close(4)                    = 0
[pid 12795] open("/etc/group", O_RDONLY) = 4
[pid 12795] fcntl(4, F_GETFD)           = 0
[pid 12795] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 12795] fstat(4, {st_mode=S_IFREG|0644, st_size=693, ...}) = 0
[pid 12795] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x2b017b3ce000
[pid 12795] lseek(4, 0, SEEK_CUR)       = 0
[pid 12795] read(4, 
"root:x:0:root\nbin:x:1:root,bin,daemon\ndaemon:x:2:root,bin,daemon\nsys:x:3:roo
t,bin,adm\nadm:x:4:root,adm,daemon\ntty:x:5:\ndisk:x:6:root\nlp:x:7:daemon,lp\nm
em:x:8:\nkmem:x:9:\nwheel:x:10:root\nmail:x:12:mail,exim\nnews:x:13:news\nuucp:x
:14:uucp\nman:x:15:\ngames:x:20:\ngopher:x:30:\ndip:x:40:\nftp:x:50:\nlock:x:54:
\nnobody:x:99:\nusers:x:100:\nnscd:x:28:\nfloppy:x:19:\nvcsa:x:69:\npcap:x:77:\n
utmp:x:22:\nutempter:x:35:\nslocate:x:21:\naudio:x:63:\nrpc:x:32:\nmailnull:x:47
:\nsmmsp:x:51:\necryptfs:x:101:\nsshd:x:74:\ndbus:x:81:\navahi:x:70:\nrpcuser:x:
29:\nnfsnobody:x:65534:\nhaldaemon:x:68:\navahi-
autoipd:x:102:\nmysql:x:27:\nftpgroup:x:107:\nexim:x:93:\nscreen:x:84:\nfwh:x:50
0:\nqwqwqw:x:501:\n1q2w3e:x:502:\n1q:x:503:\n2w:x:504:\n", 4096) = 693
[pid 12795] read(4, "", 4096)           = 0
[pid 12795] close(4)                    = 0
[pid 12795] munmap(0x2b017b3ce000, 4096) = 0
[pid 12795] setgroups(1, [500])         = 0
[pid 12795] setuid(500)                 = 0
[pid 12795] prctl(0x4, 0x1, 0, 0, 0)    = 0
[pid 12795] clock_gettime(CLOCK_MONOTONIC, {1029872, 690313456}) = 0
[pid 12795] close(6)                    = 0
[pid 12795] close(8)                    = 0
[pid 12795] rt_sigaction(SIGTERM, {SIG_DFL, [], SA_RESTORER, 0x322c6302f0}, 
NULL, 8) = 0
[pid 12795] rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x322c6302f0}, NULL, 
8) = 0
[pid 12795] rt_sigaction(SIGUSR1, {SIG_DFL, [], SA_RESTORER, 0x322c6302f0}, 
NULL, 8) = 0
[pid 12795] rt_sigaction(SIGUSR2, {SIG_DFL, [], SA_RESTORER, 0x322c6302f0}, 
NULL, 8) = 0
[pid 12795] rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTORER, 0x322c6302f0}, 
NULL, 8) = 0
[pid 12795] rt_sigaction(SIGQUIT, {0x7c33f0, [], SA_RESTORER|SA_RESTART, 
0x322c6302f0}, NULL, 8) = 0
[pid 12795] close(9)                    = 0
[pid 12795] close(10)                   = 0
[pid 12795] close(11)                   = 0
[pid 12795] close(12)                   = 0
[pid 12795] close(13)                   = 0
[pid 12795] clock_gettime(CLOCK_MONOTONIC, {1029872, 692338456}) = 0
[pid 12795] accept(0, {sa_family=AF_INET, sin_port=htons(60781), 
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
[pid 12795] clock_gettime(CLOCK_MONOTONIC, {1029872, 692611456}) = 0
[pid 12795] time(NULL)                  = 1342505962
[pid 12795] times({tms_utime=0, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 
1821051559
[pid 12795] poll([{fd=4, events=POLLIN}], 1, 5000) = 1 ([{fd=4, 
revents=POLLIN}])
[pid 12795] read(4, "\1\1\0\1\0\10\0\0", 8) = 8
[pid 12795] read(4, "\0\1\0\0\0\0\0\0", 8) = 8
[pid 12795] read(4, "\1\4\0\1\4\313\5\0", 8) = 8
[pid 12795] read(4, 
"\17'SCRIPT_FILENAME/home/_htdocs/public_html/long_loop.php\f\0QUERY_STRING\16\3
REQUEST_METHODGET\f\0CONTENT_TYPE\16\0CONTENT_LENGTH\v\16SCRIPT_NAME/long_loop.p
hp\v\16REQUEST_URI/long_loop.php\f\16DOCUMENT_URI/long_loop.php\r\31DOCUMENT_ROO
T/home/_htdocs/public_html\17\10SERVER_PROTOCOLHTTP/1.1\21\7GATEWAY_INTERFACECGI
/1.1\17\vSERVER_SOFTWAREnginx/1.2.2\v\16REMOTE_ADDR89.221.202.206\v\5REMOTE_PORT
58343\v\16SERVER_ADDR174.142.69.107\v\2SERVER_PORT80\v\20SERVER_NAMEfwh.webasyst
.com\t\20HTTP_HOSTfwh.webasyst.com\17\nHTTP_CONNECTIONkeep-
alive\22\tHTTP_CACHE_CONTROLmax-age=0\17\200\0\0\215HTTP_USER_AGENTMozilla/5.0 
(X11; Linux x86_64) AppleWebKit/535.19 (KHTML, like Gecko) Ubuntu/11.10 
Chromium/18.0.1025.168 Chrome/18.0.1025.168 Safari/535.19\v?
HTTP_ACCEPTtext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\24\21
HTTP_ACCEPT_ENCODINGgzip,deflate,sdch\24#HTTP_ACCEPT_LANGUAGEru-RU,ru;q=0.8,en-
US;q=0.6,en;q=0.4\23 HTTP_ACCEPT_CHARSETwindows-1251,utf-
8;q=0.7,*;q=0.3\v\200\0\0012HTTP_COOKIE__utma=144069043.2077055053.1307503204.13
42164095.1342166094.132; __utmc=144069043; 
__utmz=144069043.1341978575.128.65.utmcsr=feedburner|utmccn=Feed:%20webasystcom/
ru%20(Webasyst)|utmcmd=feed; PHPSESSID=uvg77mtbskgn2i0avlmmtaola4; 
_csrf=5003fbcba78297.60899625; 
last_page=1%5E%5E%5E%2Fwebasyst%2Fhosting%2F\0\0\0\0\0", 1232) = 1232
[pid 12795] read(4, "\1\4\0\1\0\0\0\0", 8) = 8
[pid 12795] clock_gettime(CLOCK_MONOTONIC, {1029872, 694032456}) = 0
[pid 12795] gettimeofday({1342505962, 584135}, NULL) = 0
[pid 12795] stat("/home/_htdocs/public_html/.user.ini", 0x7fffba4db660) = -1 
ENOENT (No such file or directory)
[pid 12795] setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={60, 0}}, NULL) 
= 0
[pid 12795] rt_sigaction(SIGPROF, {0x7078d0, [PROF], SA_RESTORER|SA_RESTART, 
0x322c6302f0}, {SIG_DFL, [], 0}, 8) = 0
[pid 12795] rt_sigprocmask(SIG_UNBLOCK, [PROF], NULL, 8) = 0
[pid 12795] lstat("/home/_htdocs/public_html/long_loop.php", 
{st_mode=S_IFREG|0644, st_size=76, ...}) = 0
[pid 12795] lstat("/home/_htdocs/public_html", {st_mode=S_IFDIR|0755, 
st_size=4096, ...}) = 0
[pid 12795] lstat("/home/_htdocs", {st_mode=S_IFDIR|0711, st_size=4096, ...}) = 
0
[pid 12795] lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 12795] lstat("/home/_htdocs/public_html/long_loop.php", 
{st_mode=S_IFREG|0644, st_size=76, ...}) = 0
[pid 12795] lstat("/home/_htdocs/public_html", {st_mode=S_IFDIR|0755, 
st_size=4096, ...}) = 0
[pid 12795] lstat("/home/_htdocs", {st_mode=S_IFDIR|0711, st_size=4096, ...}) = 
0
[pid 12795] lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 12795] lstat("/home/_htdocs/public_html/long_loop.php", 
{st_mode=S_IFREG|0644, st_size=76, ...}) = 0
[pid 12795] lstat("/home/_htdocs/public_html", {st_mode=S_IFDIR|0755, 
st_size=4096, ...}) = 0
[pid 12795] lstat("/home/_htdocs", {st_mode=S_IFDIR|0711, st_size=4096, ...}) = 
0
[pid 12795] lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 12795] lstat("/home/_htdocs/public_html/long_loop.php", 
{st_mode=S_IFREG|0644, st_size=76, ...}) = 0
[pid 12795] lstat("/home/_htdocs/public_html", {st_mode=S_IFDIR|0755, 
st_size=4096, ...}) = 0
[pid 12795] lstat("/home/_htdocs", {st_mode=S_IFDIR|0711, st_size=4096, ...}) = 
0
[pid 12795] lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 12795] lstat("/home/_htdocs", {st_mode=S_IFDIR|0711, st_size=4096, ...}) = 
0
[pid 12795] lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 12795] open("/home/_htdocs/public_html/long_loop.php", O_RDONLY) = 6
[pid 12795] fstat(6, {st_mode=S_IFREG|0644, st_size=76, ...}) = 0
[pid 12795] fstat(6, {st_mode=S_IFREG|0644, st_size=76, ...}) = 0
[pid 12795] fstat(6, {st_mode=S_IFREG|0644, st_size=76, ...}) = 0
[pid 12795] mmap(NULL, 76, PROT_READ, MAP_SHARED, 6, 0) = 0x2b017b3ce000
[pid 12795] clock_gettime(CLOCK_MONOTONIC, {1029872, 698631456}) = 0
[pid 12795] getcwd("/usr/local/nginx/logs"..., 4095) = 22
[pid 12795] chdir("/home/_htdocs/public_html") = 0
[pid 12795] setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={30, 0}}, NULL) 
= 0
[pid 12795] munmap(0x2b017b3ce000, 76)  = 0
[pid 12795] close(6)                    = 0
[pid 12795] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 12795] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER, 
0x322c6302f0}, 8) = 0
[pid 12795] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 12795] nanosleep({1, 0},  <unfinished ...>
[pid 12788] <... epoll_wait resumed> {}, 1, 737) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029873, 411240456}) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029873, 411340456}) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029873, 411447456}) = 0
[pid 12788] epoll_wait(16,  <unfinished ...>
[pid 12795] <... nanosleep resumed> {1, 0}) = 0
[pid 12795] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 12795] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER, 
0x322c6302f0}, 8) = 0
[pid 12795] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 12795] nanosleep({1, 0},  <unfinished ...>
[pid 12788] <... epoll_wait resumed> {}, 1, 1000) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029874, 411323456}) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029874, 411442456}) = 0
[pid 12788] getsockopt(13, SOL_TCP, TCP_INFO, "\n\0\0\0\0\0\0\0\300\306-
\0\0\0\0\0\30\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\240\23]=\0\0
\0\0\240\23]=\240\23]=\0\0\0\0\0\0\0\0\0\0\0\0\260q\v\0\377\377\377\177\2\0\0\0\
0\0\0\0\3\0\0\0", [92]) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029874, 411797456}) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029874, 411921456}) = 0
[pid 12788] epoll_wait(16,  <unfinished ...>
[pid 12795] <... nanosleep resumed> {1, 0}) = 0
[pid 12795] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
[pid 12795] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER, 
0x322c6302f0}, 8) = 0
[pid 12795] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 12795] nanosleep({1, 0},  <unfinished ...>
[pid 12788] <... epoll_wait resumed> 13bda920, 1, 1000) = -1 EINTR (Interrupted 
system call)
[pid 12788] --- SIGUSR2 (User defined signal 2) @ 0 (0) ---
[pid 12788] write(8, "2", 1)            = 1
[pid 12788] rt_sigreturn(0x8)           = -1 EINTR (Interrupted system call)
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029875, 99361456}) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029875, 99472456}) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC, {1029875, 99580456}) = 0
[pid 12788] epoll_wait(16, {{EPOLLIN, {u32=15680416, u64=15680416}}}, 1, 312) = 
1
[pid 12788] read(6, "2", 1)             = 1
[pid 12788] gettimeofday({1342505964, 989857}, NULL) = 0
[pid 12788] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 
0
[pid 12788] write(4, "[17-Jul-2012 02:19:24] NOTICE: Reloading in progress 
...\n", 57) = 57
[pid 12788] kill(12795, SIGQUIT <unfinished ...>
[pid 12795] <... nanosleep resumed> 0x7fffba4d94a0) = ? ERESTART_RESTARTBLOCK 
(To be restarted)
[pid 12788] <... kill resumed> )        = 0
[pid 12795] --- SIGQUIT (Quit) @ 0 (0) ---
[pid 12788] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 12795] close(0 <unfinished ...>
[pid 12788] <... clock_gettime resumed> {1029875, 100807456}) = 0
[pid 12795] <... close resumed> )       = 0
[pid 12788] read(6,  <unfinished ...>
[pid 12795] socket(PF_FILE, SOCK_STREAM, 0 <unfinished ...>
[pid 12788] <... read resumed> 0x7fffba4db8b7, 1) = -1 EAGAIN (Resource 
temporarily unavailable)
[pid 12795] <... socket resumed> )      = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 12795] rt_sigreturn(0x1 <unfinished ...>
[pid 12788] <... clock_gettime resumed> {1029875, 101356456}) = 0
[pid 12795] <... rt_sigreturn resumed> ) = -1 EINTR (Interrupted system call)
[pid 12788] kill(12795, SIGTERM <unfinished ...>
[pid 12795] rt_sigprocmask(SIG_BLOCK, [CHLD],  <unfinished ...>
[pid 12788] <... kill resumed> )        = 0
[pid 12795] <... rt_sigprocmask resumed> [], 8) = 0
[pid 12788] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 12795] --- SIGTERM (Terminated) @ 0 (0) ---
Process 12795 detached
<... clock_gettime resumed> {1029875, 101834456}) = 0
clock_gettime(CLOCK_MONOTONIC, {1029875, 103461456}) = 0
clock_gettime(CLOCK_MONOTONIC, {1029875, 103566456}) = 0
clock_gettime(CLOCK_MONOTONIC, {1029875, 103843456}) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
write(8, "C", 1)                        = 1
rt_sigreturn(0x8)                       = 0
epoll_wait(16, {{EPOLLIN, {u32=15680416, u64=15680416}}}, 1, 308) = 1
read(6, "C", 1)                         = 1
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG|WSTOPPED, NULL) 
= 12795
clock_gettime(CLOCK_MONOTONIC, {1029875, 104400456}) = 0
gettimeofday({1342505964, 994394}, NULL) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3477, ...}) = 0
write(4, "[17-Jul-2012 02:19:24] NOTICE: reloading: execvp(\"/usr/sbin/php-
fpm\", {\"/usr/sbin/php-fpm\", \"--fpm-config\", \"/etc/php-fpm.conf\"})\n", 
130) = 130
munmap(0x2b017b3cf000, 1168)            = 0
munmap(0x2b017b3d0000, 1168)            = 0
munmap(0x2b017b3d1000, 1168)            = 0
munmap(0x2b017b3d2000, 1168)            = 0
munmap(0x2b017b3d3000, 1168)            = 0
munmap(0x2b017b3d4000, 1168)            = 0
munmap(0x2b017b3d5000, 1168)            = 0
munmap(0x2b017b3d6000, 1168)            = 0
munmap(0x2b017b3d7000, 1168)            = 0
munmap(0x2b017eb57000, 1168)            = 0
munmap(0x2b017eb58000, 1168)            = 0
munmap(0x2b017eb59000, 1168)            = 0
munmap(0x2b017ed65000, 1168)            = 0
 [2013-02-13 09:47 UTC] php-bugs at puzzled dot xs4all dot nl
CentOS 6.3 with php-5.3.3 and the FPM code from 5.3.20 still has the same issue. FWIW this makes it challenging to use nginx with php-fpm because it results in a 502 Bad Gateway. I'm more than happy to help testing a patch.
 [2013-02-13 15:57 UTC] phpbugs at oops dot mooo dot com
Try setting process_control_timeout to something higher than 0.
 [2013-09-18 16:22 UTC] baohx2000 at gmail dot com
Having this issue on 5.5.3.  Sending USR2 to master process kills the master and 
orphans the children.  Have to then kill all children before I can restart the 
master process.
 [2014-02-19 19:35 UTC] ptr dot wang at gmail dot com
php5-fpm 5.5.8+dfsg-3+sury.org~precise+2 on ubuntu precise has the same problem.
killing USR2 signal to php5-fpm master process causes all php5-fpm processes (include the master process) replaced, and the active requests end up with HTTP 502 Bad gateway error on the nginx side.
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Fri Apr 18 10:03:03 2014 UTC