php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #65398 Race condition between SIGCHLD and child stdout/stderr event leads to segfault
Submitted: 2013-08-06 06:53 UTC Modified: 2021-12-08 22:06 UTC
Votes:11
Avg. Score:4.7 ± 0.6
Reproduced:10 of 10 (100.0%)
Same Version:8 (80.0%)
Same OS:8 (80.0%)
From: gonzalo dot paniagua at acquia dot com Assigned: bukka (profile)
Status: Closed Package: FPM related
PHP Version: 5.5.1 OS: Linux
Private report: No CVE-ID: None
 [2013-08-06 06:53 UTC] gonzalo dot paniagua at acquia dot com
Description:
------------
This problem affects 5.3.27, 5.4.17, 5.5.1.

The master fpm process dies with a SIGSEGV. Relevant configuration options that 
we have set is "catch_workers_output = yes" and pm on_demand.

What follows is a stack trace, the strace output, and an explanation of the 
events that lead to the segfault.

StaThread 1 (Thread 0x7ff47eed4720 (LWP 15282)):
#0  0x00007ff479fe5cc6 in fork () from /lib/libc.so.6
#1  0x0000000000870c05 in fpm_children_make (wp=0x1d746d0, in_event_loop=<value 
optimized out>, nb_to_spawn=<value optimized out>, is_debug=0)
    at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_children.c:400
#2  0x000000000087125c in fpm_children_bury () at /usr/local/src/php-
5.3.26/sapi/fpm/fpm/fpm_children.c:288
#3  0x0000000000875942 in fpm_got_signal (ev=<value optimized out>, which=<value 
optimized out>, arg=<value optimized out>)
    at /usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_events.c:73
#4  0x0000000000880833 in fpm_event_epoll_wait (queue=<value optimized out>, 
timeout=<value optimized out>) at /usr/local/src/php-
5.3.26/sapi/fpm/fpm/events/epoll.c:143
#5  0x00000000008755d7 in fpm_event_loop (err=0) at /usr/local/src/php-
5.3.26/sapi/fpm/fpm/fpm_events.c:403
#6  0x00000000008708c7 in fpm_run (max_requests=0x7fffafba0acc) at 
/usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm.c:113
#7  0x0000000000877dec in main (argc=5, argv=0x7fffafba2c38) at 
/usr/local/src/php-5.3.26/sapi/fpm/fpm/fpm_main.c:1842ck trace:

Strace output:
[pid 25742] <... clock_gettime resumed> {4285884, 268457129}) = 0
[pid 25742] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 25742] write(33, "C", 1)           = 1
[pid 25742] rt_sigreturn(0)             = 0
[pid 25742] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 25742] write(33, "C", 1)           = 1
[pid 25742] rt_sigreturn(0)             = 0
[pid 25742] epoll_wait(86, {{EPOLLHUP, {u32=24867480, u64=24867480}}, {EPOLLHUP, 
{u32=24867552, u64=24867552}}, {EPOLLIN, {u32=17325088, u64=17325088}}, 
{EPOLLHUP, {u32=24866200, u64=24866200}}, {EPOLLHUP, {u32=24866272, 
u64=24866272}}}, 261, 1001) = 5
[pid 25742] read(87, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 87, {EPOLLIN, {u32=24867480, 
u64=24867480}}) = 0
[pid 25742] close(87)                   = 0
[pid 25742] read(90, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 90, {EPOLLIN, {u32=24867552, 
u64=24867552}}) = 0
[pid 25742] close(90)                   = 0
[pid 25742] read(31, "C", 1)            = 1
[pid 25742] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, 
NULL) = 26809
[pid 25742] clock_gettime(CLOCK_MONOTONIC, {4285884, 268457129}) = 0
[pid 25742] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, 
NULL) = 26813
[pid 25742] clock_gettime(CLOCK_MONOTONIC, {4285884, 268457129}) = 0
[pid 25742] read(89, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 89, {EPOLLIN, {u32=24866200, 
u64=24866200}}) = 0
[pid 25742] close(89)                   = 0
[pid 25742] read(92, "", 1023)          = 0
[pid 25742] epoll_ctl(86, EPOLL_CTL_DEL, 92, {EPOLLIN, {u32=24866272, 
u64=24866272}}) = 0
[pid 25742] close(92)                   = 0
[pid 25742] wait4(-1, 0x7fffc66ba82c, WNOHANG|WSTOPPED, NULL) = 0
[pid 25742] read(31, "C", 1)            = 1
[pid 25742] wait4(-1, 0x7fffc66ba82c, WNOHANG|WSTOPPED, NULL) = 0
[pid 25742] read(31, 0x7fffc66ba90f, 1) = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 25742] read(89, 0x7fffc66ba490, 1023) = -1 EBADF (Bad file descriptor)
[pid 25742] gettimeofday({1375385643, 703965}, NULL) = 0
[pid 25742] open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) 
= -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = 
-1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = 
-1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en.utf8/LC_MESSAGES/libc.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 25742] open("/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) 
= -1 ENOENT (No such file or directory)
[pid 25742] write(4, "[01-Aug-2013 19:34:03] ERROR: un"..., 85) = 85
[pid 25742] close(4294967295)           = -1 EBADF (Bad file descriptor)
[pid 25742] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
[pid 25742] futex(0x7fba74d4f820, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 25742] --- SIGSEGV (Segmentation fault) @ 0 (0) ---

What's going on? In order of appearance:
  * 2 processes finish, which triggers the SIGCHLD handler twice. They both 
write one character to the writing end of a pipe.
  * epoll_wait() wakes up containing 5 events:
    a) 2 EPOLLHUP are the stdout and stderr of PID 26809.
    b) 1 is the availability of data in the read end of the pipe used by the 
SIGCHLD handler.
    c) 2 EPOLLHUP are the stdout and stderr of PID 26813.
  * The file descriptors 87 and 90 from a) above are closed and removed from the 
epoll socket.
  * Now we go on to process the SIGCHLD. Notice that the loop finds the 2 
processes that died. In both cases, fpm_children_close() is called, which in 
turn calls fpm_children_free().
  * Time to process the events from c), only that in this case, the 'arg' 
received by "fpm_stdio_child_said(struct fpm_event_s *ev, short which, void 
*arg)" points to a location of memory that has been freed in the previous step. 
EBADF and SIGSEGV follow shortly after that.


I am attaching a patch that fixes this problem for the epoll backend. If this is 
accepted as the right way to fix this issue, it is possible that similar fixes 
have to made to devpoll, poll, select, and kqueue event backends.

The patch simply ignores EPOLLHUP events since there is nothing to read and the 
SIGCHLD generated event will take care of cleaning up the file descriptors. This 
patch fixed the issue for us.


Patches

fpm-race-condition.patch (last revision 2013-08-06 06:54 UTC by gonzalo dot paniagua at gmail dot com)

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2019-01-23 04:41 UTC] mnikulin at plesk dot com
I have tried the patch https://bugs.php.net/patch-display.php?bug_id=65398&patch=fpm-race-condition.patch&revision=latest
Use after free still happens in php-7.3.1 with 77430 fix.
Tested --enable-debug build on CentOS-7.

Bug #77185 Use-after-free in FPM master event handling

   pm = static
   pm.max_children = 10
   catch_workers_output = true

and repeating reloads during several minutes.
I use Environment=MALLOC_PERTURB_=165 trick from #77185
to make use after free obvious.

(gdb) bt
#0  0x000000000085c1de in fpm_event_fire (ev=0x133a980) at /usr/src/debug/sapi/fpm/fpm/fpm_events.c:465
#1  0x000000000086f51f in fpm_event_epoll_wait (queue=0x133b5e0, timeout=943) at /usr/src/debug/sapi/fpm/fpm/events/epoll.c:146
#2  0x000000000085bf24 in fpm_event_loop (err=0) at /usr/src/debug/sapi/fpm/fpm/fpm_events.c:409
#3  0x000000000085420f in fpm_run (max_requests=0x7ffca475e17c) at /usr/src/debug/sapi/fpm/fpm/fpm.c:113
#4  0x00000000008618e9 in main (argc=2, argv=0x7ffca475e3b8) at /usr/src/debug/sapi/fpm/fpm/fpm_main.c:1873
(gdb) p *ev
$1 = {fd = -1515870811, timeout = {tv_sec = -6510615555426900571, tv_usec = -6510615555426900571}, frequency = {tv_sec = -6510615555426900571, tv_usec = -6510615555426900571}, callback = 0xa5a5a5a5a5a5a5a5, arg = 0xa5a5a5a5a5a5a5a5, 
  flags = -1515870811, index = -1515870811, which = -23131}

sapi/fpm/fpm/events/epoll.c:146 is the line added by the suggested patch

144                 /* fire the event */
145                 if (epollfds[i].events != EPOLLHUP)                                                                                                                                                                                     
146                         fpm_event_fire((struct fpm_event_s *)epollfds[i].data.ptr);                                                                                                                                                     
147 

Another case is
Bug #77114 php-fpm master segfaults in fpm_event_epoll_wait/fpm_event_fire
no reloads, ~30 pools adjusted to have short-lived children
and moderate load with pauses
    pm = ondemand
    pm.process_idle_timeout = 2s
    pm.max_requests = 2
    catch_workers_output = yes
    php_value[error_reporting] = 22519

The same failure
#0  0x000000000085c1de in fpm_event_fire (ev=0x116a370) at /usr/src/debug/sapi/fpm/fpm/fpm_events.c:465
#1  0x000000000086f51f in fpm_event_epoll_wait (queue=0x116b740, timeout=876) at /usr/src/debug/sapi/fpm/fpm/events/epoll.c:146
#2  0x000000000085bf24 in fpm_event_loop (err=0) at /usr/src/debug/sapi/fpm/fpm/fpm_events.c:409
#3  0x000000000085420f in fpm_run (max_requests=0x7ffd449b113c) at /usr/src/debug/sapi/fpm/fpm/fpm.c:113
#4  0x00000000008618e9 in main (argc=2, argv=0x7ffd449b1378) at /usr/src/debug/sapi/fpm/fpm/fpm_main.c:1873
(gdb) p *ev
$1 = {fd = -1515870811, timeout = {tv_sec = -6510615555426900571, tv_usec = -6510615555426900571}, frequency = {tv_sec = -6510615555426900571, tv_usec = -6510615555426900571}, callback = 0xa5a5a5a5a5a5a5a5, arg = 0xa5a5a5a5a5a5a5a5, 
  flags = -1515870811, index = -1515870811, which = -23131}

There are at least one more bug reported with similar symptoms
Bug #62418 php-fpm master process crashes
 [2021-12-08 22:06 UTC] bukka@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: bukka
 [2021-12-08 22:06 UTC] bukka@php.net
I think this should be addressed by the changes that went to PHP 7.4. For example https://github.com/php/php-src/pull/4385 and there are other fixes as well.

Please open a new bug report or comment here if you still see the issue in the latest PHP version.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Dec 10 07:01:28 2024 UTC