php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #43610 fastcgi socket dies on high concurrency
Submitted: 2007-12-16 21:55 UTC Modified: 2008-09-03 01:00 UTC
Votes:11
Avg. Score:4.5 ± 0.8
Reproduced:9 of 9 (100.0%)
Same Version:9 (100.0%)
Same OS:5 (55.6%)
From: oliver at realtsp dot com Assigned:
Status: No Feedback Package: CGI/CLI related
PHP Version: 5.2.5 OS: FreeBSD 6.2
Private report: No CVE-ID: None
Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.
Password:
Status:
Package:
Bug Type:
Summary:
From: oliver at realtsp dot com
New email:
PHP Version: OS:

 

 [2007-12-16 21:55 UTC] oliver at realtsp dot com
Description:
------------
Version information below.

When I load the server with siege, once the fastcgi-php parent process reaches a load of ~200 concurrent requests the process appears to crash and refuses to accept further connections, even after the load is removed again. Only way to recover is to restart lighttpd and thereby the fastcgi-php server (and its children).

Clearly a load of 200+ is probably overloaded not sustainable. However a non-recoverable crash means that even when a temporary load drops away (produced by some aggressive robot on our production setup for example) the server remains unusable and returns 500 responses.

php version as below, but patched with this:

http://cvs.php.net/viewvc.cgi/php-src/main/SAPI.c?r1=1.202.2.7.2.15&r2=1.202.2.7.2.16&pathrev=PHP_5_2&diff_format=u

because of this bug:

http://bugs.php.net/bug.php?id=43295

That patch removes the errors on

/root/php-5.2.5/main/SAPI.c(445)

but the "overload crash remains".


[root@muriwai /usr/ports/lang/php5]# lighttpd -v
lighttpd-1.4.18 (ssl) - a light and fast webserver
Build-Date: Dec  5 2007 18:23:49

fastcgi.server             = ( ".php" =>
                               ( "localhost" =>
                                 (
                                   "socket" => "/var/run/lighttpd/php-fastcgi.socket",
                                   "bin-path" => "/usr/local/bin/php-cgi",
                                   "max-procs" => 1,
                                   "bin-environment" => (
                                     "PHP_FCGI_CHILDREN" => "16",
                                     "PHP_FCGI_MAX_REQUESTS" => "500" ),
                                   "broken-scriptfilename" => "enable"
                                 )
                               )
                            )

[root@muriwai /usr/ports/lang/php5]# php-cgi -v
PHP 5.2.5 (cgi-fcgi) (built: Dec 16 2007 20:47:09) (DEBUG)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies


[root@muriwai /usr/ports/lang/php5]# php-cgi -m
[PHP Modules]
cgi-fcgi
date
libxml
Reflection
standard

[Zend Modules]

NOTE: no opcode cache or third party extensions

php.ini parsed is "none" (ie all defaults)


FreeBSD muriwai.realtsp.com 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 12 08:43:30 UTC 2007     root@portnoy.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP  amd64



Reproduce code:
---------------
a trivial script will do:

<?php

sleep(1);
phpinfo();


with .siegerc

#
# Default number of simulated  concurrent users
# ex: concurrent = 25
#
concurrent = 250



Expected result:
----------------
php fastcgi parent process (and children) remaining stable. maybe an "overloaded" or even temporary 500 type response would be acceptable. but crashing and then being completely unreponsive is a bit of an issue.



Actual result:
--------------
lighttpd.error.log reports this

2007-12-16 21:19:22: (mod_fastcgi.c.1731) connect failed: Connection refused on unix:/var/run/lighttpd/php-fastcgi.socket-87058-0
2007-12-16 21:19:22: (mod_fastcgi.c.2885) backend died; we'll disable it for 5 seconds and send the request to another backend instead: reconnects: 0 load: 210
2007-12-16 21:19:22: (mod_fastcgi.c.3496) all handlers for  /index.php on .php are down.

i haven't managed a backtrace yet, because this is not that easy with a fastcgi process, but i am working on it.

NOTE: despite what lighty says above..it does not restart the php parent process. I am not sure why, but this is a separate issue i believe.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-12-17 10:44 UTC] oliver at realtsp dot com
We have tried with  

  http://snaps.php.net/php5.2-latest.tar.gz

Result is unchanged. 

NOTE that the php workers and parent processes are still showing on ps after the crash (same as before the crash). But lightly cannot get a sensible response from them.

[root@muriwai /usr/ports/lang/php5]# pstree  
...
 |-+- 25262 www /usr/local/sbin/lighttpd -f /usr/local/etc/lighttpd.conf
 | \-+= 25263 www /usr/local/bin/php-cgi
 |   |--- 25264 www /usr/local/bin/php-cgi
 |   |--- 25265 www /usr/local/bin/php-cgi
 |   |--- 25266 www /usr/local/bin/php-cgi
 |   |--- 25267 www /usr/local/bin/php-cgi
 |   |--- 25268 www /usr/local/bin/php-cgi
 |   |--- 25269 www /usr/local/bin/php-cgi
 |   |--- 25270 www /usr/local/bin/php-cgi
 |   |--- 25271 www /usr/local/bin/php-cgi
 |   |--- 25272 www /usr/local/bin/php-cgi
 |   |--- 25273 www /usr/local/bin/php-cgi
 |   |--- 25274 www /usr/local/bin/php-cgi
 |   |--- 25275 www /usr/local/bin/php-cgi
 |   |--- 25276 www /usr/local/bin/php-cgi
 |   |--- 25277 www /usr/local/bin/php-cgi
 |   |--- 25278 www /usr/local/bin/php-cgi
 |   \--- 25279 www /usr/local/bin/php-cgi
....
 [2007-12-17 13:05 UTC] oliver at realtsp dot com
Actually......

It turns out that the php parent is not dead at all. Even with stable 5.2.5 (rather than 5.2-latest) if you setup the fastcgi server to be started separately from lighty ie with lighty config like this:

fastcgi.server             = ( ".php" =>
                               ( "localhost" =>
                                 (
                                   "socket" => "/tmp/php-fastcgi.sock"
                                 )
                               )
                            )

and the use spawn_fcgi to start the php fcgi server manually. Then all behaves as expected. ie you get some (not all!!) 500s while the overload condition exists and when the load drops away you get all normal 200 responses again. ie elastic/tolerant performance as hoped for.

After some investigation into the the lighty source it turns out that lighty is confused by the fact that PHP just fails to respond (ie timeout) rather than returning FCGI_OVERLOADED. refer to this:

http://bugs.php.net/bug.php?id=39809

where dimitry said:

"PHP cannot return FCGI_OVERLOADED, because all PHP processes are busy
and nobody accepts new connection. The only way to detect this situation - use connection timeout."

lighty however is sticking to the fastcgi spec and expecting the php parent to be in shutdown mode (ie its PID to dissappear) when it does not respond (after which it would then respawn a new parent). But because the PHP parent is just busy and not actually shutting down, the PID never dissappears and lighty gets stuck in a loop.

I have posted a workaround involving starting PHP separately here:

http://trac.lighttpd.net/trac/ticket/1488

which also proposes a "patch" to deal with PHP's non-standard behaviour regarding FCGI_OVERLOADED.

However, the fundamental problem remains: It is very difficult for a FASTCGI client to determine what is going on and therefore what to do when php just times out on connections rather than returning the correct FCGI_OVERLOADED response.

I did not understand dmitry's original reason for this: "PHP cannot return FCGI_OVERLOADED, because all PHP processes are busy
and nobody accepts new connection."

Could you explain or perhaps review PHP's behaviour under overloaded conditions.

Thanks

Oliver
 [2007-12-22 15:10 UTC] olafvdspek at gmail dot com
> Could you explain or perhaps review PHP's behaviour under overloaded
conditions.

I'm no PHP developer and haven't looked at the code, but my guess:
A PHP process has C children, each being able to handle one connection. When that connection is closed, it'll do an accept() to handle a new connection.
When a web server opens more than C connections, those will not be accepted until an existing connection is closed, which may take a long time.
So a web server should never open more than C connections to one PHP process.
 [2007-12-23 11:31 UTC] oliver at realtsp dot com
@olafvdspek at gmail dot com

That is not in keeping with the FastCGI spec:

#  FCGI_OVERLOADED: rejecting a new request. This happens when the application runs out of some resource, e.g. database connections.

The situation I am talking about here is a severely overloaded condition. ie all php worker (child) processes are already busy and there is a queue of, in my case, an additional 200+ connections. 

My suggestion is that the php parent process allows a max_fastcgi_queue of say 200 and then rejects further connections with 
FCGI_OVERLOADED. Since the parent process manages this queue it should its size and it "should" be be easy to place a max limit on that size. The limit could be configured in php.ini.
 [2007-12-23 11:53 UTC] olafvdspek at gmail dot com
> Since the parent process manages this queue

Eh, are you sure it does? As far as I know that's not true.
 [2008-09-03 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 13:01:29 2024 UTC