php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #65398 Race condition between SIGCHLD and child stdout/stderr event leads to segfault
Submitted: 2013-08-06 06:53 UTC Modified: -
Votes:10
Avg. Score:4.9 ± 0.3
Reproduced:9 of 9 (100.0%)
Same Version:7 (77.8%)
Same OS:8 (88.9%)
From: gonzalo dot paniagua at acquia dot com Assigned:
Status: Open Package: FPM related
PHP Version: 5.5.1 OS: Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2013-08-06 06:53 UTC] gonzalo dot paniagua at acquia dot com
Description:
------------
This problem affects 5.3.27, 5.4.17, 5.5.1.

The master fpm process dies with a SIGSEGV. Relevant configuration options that 
we have set is "catch_workers_output = yes" and pm on_demand.

What follows is a stack trace, the strace output, and an explanation of the 
events that lead to the segfault.

StaThread 1 (Thread 0x7ff47eed4720 (LWP 15282)):
#0  0x00007ff479fe5cc6 in fork () from /lib/libc.so.6
#1  0x0000000000870c05 in fpm_children_make (wp=0x1d746d0, in_event_loop=<value 
optimized out>, nb_to_spawn=<value optimized out>, is_debug=0)
    at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_children.c:400
#2  0x000000000087125c in fpm_children_bury () at /usr/local/src/php-
5.3.26/sapi/fpm/fpm/fpm_children.c:288
#3  0x0000000000875942 in fpm_got_signal (ev=<value optimized out>, which=<value 
optimized out>, arg=<value optimized out>)
    at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_events.c:73
#4  0x0000000000880833 in fpm_event_epoll_wait (queue=<value optimized out>, 
timeout=<value optimized out>) at /usr/local/src/php-
5.3.26/sapi/fpm/fpm/events/epoll.c:143
#5  0x00000000008755d7 in fpm_event_loop (err=0) at /usr/local/src/php-
5.3.26/sapi/fpm/fpm/fpm_events.c:403
#6  0x00000000008708c7 in fpm_run (max_requests=0x7fffafba0acc) at 
/usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm.c:113
#7  0x0000000000877dec in main (argc=5, argv=0x7fffafba2c38) at 
/usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_main.c:1842ck trace:

Strace output:
[pid 25742] <... clock_gettime resumed> {4285884, 268457129}) = 0
[pid 25742] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 25742] write(33, "C", 1)           = 1
[pid 25742] rt_sigreturn(0)             = 0
[pid 25742] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 25742] write(33, "C", 1)           = 1
[pid 25742] rt_sigreturn(0)             = 0
[pid 25742] epoll_wait(86, {{EPOLLHUP, {u32=24867480, u64=24867480}}, {EPOLLHUP, 
{u32=24867552, u64=24867552}}, {EPOLLIN, {u32=17325088, u64=17325088}}, 
{EPOLLHUP, {u32=24866200, u64=24866200}}, {EPOLLHUP, {u32=24866272, 
u64=24866272}}}, 261, 1001) = 5
[pid 25742] read(87, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 87, {EPOLLIN, {u32=24867480, 
u64=24867480}}) = 0
[pid 25742] close(87)                   = 0
[pid 25742] read(90, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 90, {EPOLLIN, {u32=24867552, 
u64=24867552}}) = 0
[pid 25742] close(90)                   = 0
[pid 25742] read(31, "C", 1)            = 1
[pid 25742] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, 
NULL) = 26809
[pid 25742] clock_gettime(CLOCK_MONOTONIC, {4285884, 268457129}) = 0
[pid 25742] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, 
NULL) = 26813
[pid 25742] clock_gettime(CLOCK_MONOTONIC, {4285884, 268457129}) = 0
[pid 25742] read(89, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 89, {EPOLLIN, {u32=24866200, 
u64=24866200}}) = 0
[pid 25742] close(89)                   = 0
[pid 25742] read(92, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 92, {EPOLLIN, {u32=24866272, 
u64=24866272}}) = 0
[pid 25742] close(92)                   = 0
[pid 25742] wait4(-1, 0x7fffc66ba82c, WNOHANG|WSTOPPED, NULL) = 0
[pid 25742] read(31, "C", 1)            = 1
[pid 25742] wait4(-1, 0x7fffc66ba82c, WNOHANG|WSTOPPED, NULL) = 0
[pid 25742] read(31, 0x7fffc66ba90f, 1) = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 25742] read(89, 0x7fffc66ba490, 1023) = -1 EBADF (Bad file descriptor)
[pid 25742] gettimeofday({1375385643, 703965}, NULL) = 0
[pid 25742] open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) 
= -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = 
-1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = 
-1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en.utf8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) 
= -1 ENOENT (No such file or directory)
[pid 25742] write(4, "[01-Aug-2013 19:34:03] ERROR: un"..., 85) = 85
[pid 25742] close(4294967295)           = -1 EBADF (Bad file descriptor)
[pid 25742] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
[pid 25742] futex(0x7fba74d4f820, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 25742] --- SIGSEGV (Segmentation fault) @ 0 (0) ---

What's going on? In order of appearance:
  * 2 processes finish, which triggers the SIGCHLD handler twice. They both 
write one character to the writing end of a pipe.
  * epoll_wait() wakes up containing 5 events:
    a) 2 EPOLLHUP are the stdout and stderr of PID 26809.
    b) 1 is the availability of data in the read end of the pipe used by the 
SIGCHLD handler.
    c) 2 EPOLLHUP are the stdout and stderr of PID 26813.
  * The file descriptors 87 and 90 from a) above are closed and removed from the 
epoll socket.
  * Now we go on to process the SIGCHLD. Notice that the loop finds the 2 
processes that died. In both cases, fpm_children_close() is called, which in 
turn calls fpm_children_free().
  * Time to process the events from c), only that in this case, the 'arg' 
received by "fpm_stdio_child_said(struct fpm_event_s *ev, short which, void 
*arg)" points to a location of memory that has been freed in the previous step. 
EBADF and SIGSEGV follow shortly after that.


I am attaching a patch that fixes this problem for the epoll backend. If this is 
accepted as the right way to fix this issue, it is possible that similar fixes 
have to made to devpoll, poll, select, and kqueue event backends.

The patch simply ignores EPOLLHUP events since there is nothing to read and the 
SIGCHLD generated event will take care of cleaning up the file descriptors. This 
patch fixed the issue for us.


Patches

fpm-race-condition.patch (last revision 2013-08-06 06:54 UTC) by gonzalo dot paniagua at gmail dot com)

Add a Patch

Pull Requests

Add a Pull Request

 
PHP Copyright © 2001-2017 The PHP Group
All rights reserved.
Last updated: Sun Nov 19 01:31:42 2017 UTC