php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #40286 PHP fastcgi with PHP_FCGI_CHILDREN don't kill children when parent is killed
Submitted: 2007-01-30 11:34 UTC Modified: 2008-05-26 01:00 UTC
Votes:70
Avg. Score:4.5 ± 0.8
Reproduced:58 of 60 (96.7%)
Same Version:35 (60.3%)
Same OS:40 (69.0%)
From: gabriel at oxeva dot fr Assigned: dmitry (profile)
Status: No Feedback Package: CGI/CLI related
PHP Version: 5.2.0+ OS: Linux 2.6
Private report: No CVE-ID: None
 [2007-01-30 11:34 UTC] gabriel at oxeva dot fr
Description:
------------
Context:
When running PHP in FastCGI mode with a fastCGI apache module (such a mod_fcgid), all is running fine when PHP_FCGI_CHILDREN unset : only 1 process spawned. When using PHP_FCGI_CHILDREN=n, the PHP parent process forks n childs, and the parent acts as a manager between the child processes, wait()ing to respawn them if they are killed or exit. The problem happens when the FastCGI process manager handled by the apache module has to kill the parent PHP process (it only knows the parent's PID) for any reason such as idle timeout, max lifetime, etc.

Problem:
While the PHP parent process is properly killed by the FastCGI process manager, the children aren't killed, but instead stay alive, waiting for a new request which will never come (because the socket shared with the parent is removed at the same time parent is killed).

Reproduce code:
---------------
This is not always reproducible, as the problem only happens when the php FastCGI processes are busy.

The only way the kill these "orphan" children, is using the signal 9 on them (to interrupt the blocking read() syscall they are executing)

Expected result:
----------------
In the example, the fastCGI process manager spawns php by fork()ing then exec()ing /path/to/php , with environment PHP_FCGI_CHILDREN=2

PHP parent process is PID 10, and it forks itself 2 childs, PID 11 and 12.

When killing PID 10 with normal signal 15, and the whole php processes are under load, PID 10 is killed, but the 2 children PID 11 and 12 stay alive.

The expected result is that when the PHP parent process is killed, all the children in any processing state are killed too.

Actual result:
--------------
strace of children processes (PID 11 and 12) still alive gives :
# strace -p 11
Process 11 attached - interrupt to quit
read(3,  <unfinished ...>

PID 12 give the same result.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-01-30 11:40 UTC] tony2001@php.net
From what I can see in the sources, FastCGI registers a signal handler to kill its children on shutdown (see sapi/cgi/cgi_main.c, line 1219), but this handler surely won't be called on SIGKILL. Hence the question - what signal do you mean by "killed"?
 [2007-01-30 11:50 UTC] gabriel at oxeva dot fr
In all the report, I mean killed is kill with signal 15 (TERM).

As stated in the report, children are blocking in a syscall, which means they can only be killed by signal 9 (KILL). The fastcgi_cleanup function registered on shutdown kills with TERM signal 15. I think the bug occurs when children, under load, are executing a syscall when the parent is killed and start the fastcgi_cleanup. A fast workaround would be to kill children with signal 9 in the fastcgi_cleanup, at sapi/cgi/cgi_main.c:951.
 [2007-01-30 13:20 UTC] dmitry@php.net
Could you plase attach debugger to non-killed process and provide backtrace.

Do php-5.2 has the same problem?
 [2007-01-30 14:26 UTC] gabriel at oxeva dot fr
strace -p <PID> provides the following :
read(3,  <unfinished ...>

and gdb program <PID> and "bt" provides :
(gdb) bt
#0  0xb7fe3410 in ?? ()
#1  0xbfd86618 in ?? ()
#2  0x00000008 in ?? ()
#3  0xbfd86600 in ?? ()
#4  0x008e14f3 in __read_nocancel () from /lib/tls/libc.so.6
#5  0x083ba23e in fcgi_read ()
#6  0x083bbb38 in FCGX_FPrintF ()
#7  0x0831ab22 in sapi_deactivate ()
#8  0x08314a3d in php_request_shutdown ()
#9  0x083bcdeb in main ()

Please note that I can't test with debugging symbols (the libraries and PHP are stripped), as this binary is in production environment and the bug occurs only under load.
 [2007-01-30 14:27 UTC] gabriel at oxeva dot fr
Missed to say that PHP 5 has exactly the same problem
 [2007-01-30 14:33 UTC] gabriel at oxeva dot fr
Forgot to mention that the backtrace provided is from a PHP 5.1.4 process, not php 5.2. Sorry for the misreading.

I can compile and run a PHP 5.2 process and wait for having one killed without his children, but it will take some time to give you the results.
 [2007-01-31 11:56 UTC] gabriel at oxeva dot fr
And today, I can now confirm that the bugs exists with PHP 5.2.0 too. Here is the backtrace :

(gdb) bt
#0  0xb7fb2410 in ?? ()
#1  0xbfc06988 in ?? ()
#2  0x00000008 in ?? ()
#3  0xbfc069b0 in ?? ()
#4  0x006ee4f3 in __read_nocancel () from /lib/tls/libc.so.6
#5  0x0841b6d4 in fcgi_init_request ()
#6  0x0841b770 in fcgi_read ()
#7  0x0841c546 in fcgi_putenv ()
#8  0x08382d33 in sapi_deactivate ()
#9  0x0837c4f6 in php_request_shutdown ()
#10 0x0841e463 in main ()
 [2007-02-16 11:48 UTC] dmitry@php.net
I hope the bug is fixed in CVS HEAD, PHP_5_2 (not in 5.2.1) and PHP_4_4 (not in 4.4.5).

The patch for PHP_5_2 folows:

Index: sapi/cgi/cgi_main.c
===================================================================
RCS file: /repository/php-src/sapi/cgi/cgi_main.c,v
retrieving revision 1.267.2.15.2.22
diff -u -p -d -r1.267.2.15.2.22 cgi_main.c
--- sapi/cgi/cgi_main.c	15 Feb 2007 12:33:16 -0000	1.267.2.15.2.22
+++ sapi/cgi/cgi_main.c	16 Feb 2007 11:16:39 -0000
@@ -355,18 +355,14 @@ static int sapi_cgi_send_headers(sapi_he
 
 static int sapi_cgi_read_post(char *buffer, uint count_bytes TSRMLS_DC)
 {
-	uint read_bytes=0, tmp_read_bytes;
-#if PHP_FASTCGI
-	char *pos = buffer;
-#endif
+	int read_bytes=0, tmp_read_bytes;
 
 	count_bytes = MIN(count_bytes, (uint) SG(request_info).content_length - SG(read_post_bytes));
 	while (read_bytes < count_bytes) {
 #if PHP_FASTCGI
 		if (fcgi_is_fastcgi()) {
 			fcgi_request *request = (fcgi_request*) SG(server_context);
-			tmp_read_bytes = fcgi_read(request, pos, count_bytes - read_bytes);
-			pos += tmp_read_bytes;
+			tmp_read_bytes = fcgi_read(request, buffer + read_bytes, count_bytes - read_bytes);
 		} else {
 			tmp_read_bytes = read(0, buffer + read_bytes, count_bytes - read_bytes);
 		}

 [2007-08-22 14:44 UTC] gabriel at oxeva dot fr
Comment from ondrej@sury.org : 

I don't believe that this patch could correct blocking I/O.

What this patch does is just remove one extra memory pointer which was
not needed.
 
Correct fix would be to use O_NONBLOCK when opening file descriptor and
then test for EAGAIN.  Or use select(2) before reading from descriptor
in safe_read() function to test if data is available for reading.

I could be wrong, but it just doesn't seems to be fix for this problem.
 [2007-09-14 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2007-09-27 03:20 UTC] atomo64 at gmail dot com
jani@php.net:

The problem is that this bug affects Debian's PHP5 package of etch[1] and in order to fix it the right patch is required. We can't simply 'update' the source package.

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=431799
 [2007-11-11 22:22 UTC] jakob dot at at gmx dot net
Workaround: Kill those lurking process regularily using a cronjob.
This works for me (Ubuntu Dapper, PHP 5.1.2 (cgi-fcgi) (built: Jul 17 2007 17:21:59) ), you probably need pkill -9 .

#/bin/bash
pkill -f -x /usr/lib/cgi-bin/php -P 1
 [2008-05-06 21:08 UTC] jakobunt at gmail dot com
I still experience this on Ubuntu Hardy, 
PHP 5.2.4-2ubuntu5 with Suhosin-Patch 0.9.6.2 (cgi-fcgi), so this should be reopened.

A pstree showing the orphaned processes:
http://launchpadlibrarian.net/14265483/phpkiller.log
 [2008-05-06 21:35 UTC] gabriel at oxeva dot fr
Bug reopened as requested by jakobunt at gmail dot com
It is indeed the same bug as the one I described: my logs were the same as yours when I was using the php forking feature.

As far as I remember, the child php processes were dropped by the parent php process (acting as a dispatcher) because apache's fastcgi process manager "Fastcgi PM" sent a SIGKILL signal to it (the only PID the fastcgi PM is aware of because it spawned it). This signal is normally only sent if the process do not exit a few seconds after being sent a SIGTERM signal. The problem is not the parent being killed, but the children waiting on their own loop infinitely.

I guess this bug is in the fastcgi accept loop, which leaves the php children stalled waiting on a FD without any process attached to the other side.
 [2008-05-18 00:39 UTC] jani@php.net
Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.2-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.2-win32-installer-latest.msi

We do not support 3rd party stuff, so PLEASE try the unpatched, clean sources provided in above link.
 [2008-05-26 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2008-10-09 08:35 UTC] natit at ctvnet dot dp dot ua
HELP CHILDREN OF UKRAINE!
Children in need.
DONATE EDUCATIONAL MATERIALS TO CHILDREN IN IMPOVERISHED COUNTRIES
In some parts of the world, educational materials such as books, paper, pencils, rulers and erasers are scarce and expensive. Donate now to help children in need.
PAY PAL   natit@ctvnet.dp.ua
 [2009-01-21 23:34 UTC] xani666 at gmail dot com
In my case that bug was happening on apache + php 5.2.8 + mod_fastcgi + mod_suexec with about ~20 sites (3-4 busy most time of day, other used from time to time) with fastcgi options -idle-timeout 240 -maxClassProcesses 1 and PHP_FCGI_CHILDREN=4 so processes died quite often.
 [2009-05-16 03:43 UTC] scripts at topducks dot com
I'm using php5.2.9 mod_fcgid, not using children method.
Whenever I restart apache I get orphaned php processes.
Without the cron job to check and kill them off they would waste a lot of memory.
 [2009-07-02 03:41 UTC] porjo38 at yahoo dot com dot au
The php 5.3.0 changelog states the following:

"Fixed bug #40286 (PHP fastcgi with PHP_FCGI_CHILDREN don't kill children when parent is killed). (Dmitry)"

I've just compiled php 5.3.0 under Centos5.3 with Apache2.2.3 + mod_fcgid2.2-4.

The issue is still occuring for me. When I restart Apache, I usually end up with a bunch of php-cgi process with ppid of 1 (init), although it doesn't happen every time.
 [2009-07-22 20:14 UTC] bgross at mcw dot edu
I'm not familiar with the inner-workings PHP, so I'm sorry if this is 
not relevant.

I was experiencing a problem with php-cgi processes staying around and 
filling up my memory. After I added the line "cgi.fix_pathinfo=1" to my 
php.ini, the problem went away.

I'm using PHP FastCGI 5.2.6 with Lighttpd 1.4.19 on Debian 5.0.2

Hope that's helpful
 [2009-07-22 20:45 UTC] bgross at mcw dot edu
... on second thought, after looking at my php.ini again, I think the 
major change was due to adding the line "session.gc_probability = 1". I 
believe this is set to "session.gc_probability = 0" by default in Debian
 [2009-07-26 21:55 UTC] machochito at gmail dot com
I have the same problem on CentOS 5.3 with php 5.2.9.
Have someone solution to this problem?
Thanks.
 [2010-10-18 12:06 UTC] jason at backup-technology dot co dot uk
We're experiencing this issue with 5.2.14 and also 5.3.3.

On 5.2.14 the strace of the hanging processes with parent ID 1 left behind show this:

Process 21330 attached - interrupt to quit
accept(0,  

It hangs on that and if we interrupt it shows:

Process 21330 attached - interrupt to quit
accept(0,  <unfinished ...>

Running a gdb (with debug symbols) and attaching to the process and running "bt" we get:

(gdb) bt
#0  0x000000320c8d4530 in __accept_nocancel () from /lib64/libc.so.6
#1  0x000000000062abe8 in fcgi_accept_request (req=0x7fff3cb385b0) at /usr/src/debug/php-5.2.14/sapi/cgi/fastcgi.c:957
#2  0x000000000062c14f in main (argc=1, argv=0x7fff3cb3a758) at /usr/src/debug/php-5.2.14/sapi/cgi/cgi_main.c:1703

On the 5.3.3 (with no debug symbols) we have the following:

(gdb) bt
#0  0x00000038936d4530 in __accept_nocancel () from /lib64/libc.so.6
#1  0x000000000063e0c3 in ?? ()
#2  0x000000000063ad2a in ?? ()
#3  0x000000389361d994 in __libc_start_main () from /lib64/libc.so.6
#4  0x0000000000421ec9 in _start ()

Hope this helps.

Jason.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 09:01:32 2024 UTC