php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #77060 PHP-FPM pm.process_idle_timeout behaviour
Submitted: 2018-10-25 14:30 UTC Modified: 2022-03-07 16:57 UTC
Votes:3
Avg. Score:3.3 ± 0.5
Reproduced:2 of 2 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (50.0%)
From: contact at sshilko dot com Assigned: bukka (profile)
Status: Assigned Package: FPM related
PHP Version: 7.4.28 OS: Amazon Linux 2
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2018-10-25 14:30 UTC] contact at sshilko dot com
Description:
------------
Hello, i manage multiple AWS EC2 servers, with docker running on them.
I compile and deploy php, and have deep knowledge of configuration options - to the level where i look up how things work in PHP source code.

My question/bug is that php-fpm fastcgi server configuration option
pm_process_idle_timeout
as seen 

http://php.net/manual/en/install.fpm.configuration.php
https://github.com/php/php-src/blob/1ad08256f349fa513157437abc4feb245cce03fc/sapi/fpm/fpm/fpm_process_ctl.c#L368

When set to low values doesnt make any difference: whether its 10 seconds or 1 second.

Typical scenario is i have 50 php-fpm child workers running, then with traffic spike the go up to >150 workers. But when traffic spike is over (after 10..50 seconds) the amount of workers remain between 150...140 for another 5-10 minutes, and basicly never goes down to previous level of 50.

process_idle_timeout states 
"The number of seconds after which an idle process will be killed. Used only when pm is set to ondemand"

Which is not true.

Even if server is able to serve requests with <50 workers, it keeps >100 running after spike is over.

Does that happen because it round-robin requests thru all those 150 workers and each of 150 workers get a request in less than 1 second ?

Because my setting is process_idle_timeout=1 



Test script:
---------------
Linux ip-172-30-0-201.ec2.internal 4.14.47-64.38.amzn2.x86_64 #1

PHP-FPM 7.2.11

[global]
process_control_timeout=10s
process.max=210
process.priority=0
log_level=warning

[www]
user = www-data
group = www-data
listen = /var/run/php7-fpm.sock
listen.owner = www-data
listen.group = www-data
pm.process_idle_timeout = 1;
security.limit_extensions=.php
;when pm=ondemand, backlog cant be lower than 511
listen.backlog = 511
request_terminate_timeout=902s
listen.allowed_clients=127.0.0.1
pm = ondemand
pm.max_children = 150
pm.start_servers = 80
pm.min_spare_servers = 10
pm.max_spare_servers = 20
pm.max_requests=400


Nginx
    fastcgi_index rawrpc.php;
    try_files $uri =404;
    include /etc/nginx/fastcgi_params;
    fastcgi_pass unix:/var/run/php7-fpm.sock;
    fastcgi_buffer_size 4k;
    fastcgi_buffering on;
    fastcgi_buffers 8 4k;
    fastcgi_cache nginxcache;
    fastcgi_cache_bypass $skip_cache;
    fastcgi_cache_valid 200 5m;
    fastcgi_connect_timeout 15;
    fastcgi_keep_conn on;
    fastcgi_no_cache $skip_cache;
    fastcgi_read_timeout 90;
    fastcgi_request_buffering on;
    fastcgi_send_timeout 5;

    fastcgi_param SCRIPT_FILENAME $request_filename;

  }


Expected result:
----------------
Setting process_idle_timeout=1 will terminate workers to previous level (before spike)

Actual result:
--------------
Setting process_idle_timeout=1 does not affect workers spawned amount at all, all the child processes spawned keep on going for few more hours.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-10-28 18:26 UTC] bukka@php.net
-Status: Open +Status: Feedback -Assigned To: +Assigned To: bukka
 [2018-10-28 18:26 UTC] bukka@php.net
Well this option has got effect only if you have request running longer than process_idle_timeout seconds (in your case more than 1 second). Basically the timer is reset after returning response so the child won't get killed if it the request is short which is usually the case. It is meant just to kill the long running requests. Have you got such long running requests or are your requests quick?
 [2018-10-29 07:17 UTC] contact at sshilko dot com
My average 99% server (php-fpm) response times are within 30...100ms, measured by AWS ELB and NewRelic.

If it indeed only affects php-fpm childs that (return requests in > process_idle_timeout), then it may explain the behaviour seen.

Fix would be to have a different behaviour (scheduler/time measure) or allow <1second argument value, or update documentation at least.
 [2018-11-04 17:12 UTC] bukka@php.net
-Status: Feedback +Status: Open -Type: Bug +Type: Documentation Problem
 [2018-11-04 17:12 UTC] bukka@php.net
I agree that the docs should be a bit clearer so changing it to doc bug.

I think that to address your problem, we would need to have a smarter process manager that can properly scale in and out.
 [2022-03-06 22:53 UTC] bukka@php.net
-Type: Documentation Problem +Type: Feature/Change Request
 [2022-03-06 22:53 UTC] bukka@php.net
I have been just looking to this and updating the docs and I see that I must have confused it with request_terminate_timeout that is about timeout for running request. The process_idle_timeout is just about accept stage currently which means a time the process waits for connection. There are some plans to extend it to handle keepalive (waiting for fcgi read) but that's not the issue here. As noted the real issue is the round-robing request selection that is caused by C accept. We have got already feature request for this https://bugs.php.net/bug.php?id=77959 but this is older so I will keep this open and close it once this is addressed (epoll based request accepting). I have got this on my list so it should hopefully happen in the future.

In terms of docs, they are actually correct and after trying to clarify it, I think it's better to keep it as it is because it would get quite confusing. We would need probably the whole section to describe how the whole thing works which is something that I would like to do in the future but we first need to address all those issues.
 [2022-03-07 16:57 UTC] contact at sshilko dot com
-PHP Version: 7.2.11 +PHP Version: 7.4.28
 [2022-03-07 16:57 UTC] contact at sshilko dot com
pm.process_idle_timeout mixed

The number of seconds after which an idle process will be killed. Used only when pm is set to ondemand. Available units: s(econds)(default), m(inutes), h(ours), or d(ays). Default value: 10s.

Yes the other issue #77959 describes the same behaviour i see.
The idea is to free memory/resources more dynamicly with ondemand+process_idle_timeout configuration. That would help out monitoring and will sync the real-world load balancer load with php-fpm amount of workers and so on.

It's not a critical bug, nor big performance improvement, but more of inconvenience when managing high-load php-fpm.

BTW since then, i split my application into multiple php-fpm pools and it did help.
So now i dont see massive amount of workers after traffik spike, because i limit max workers per pool:

* php-fpm will not run out of workers, because there are multiple pools vs one pool eating-up all workers (email campaign eating all resources, leaving no php-fpm workers for other requests)

Thanks for feedback anyway
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Mon Aug 08 14:05:43 2022 UTC