|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2005-09-02 11:22 UTC] gamr at gamrdev dot com
Description: ------------ current apc and php from cvs, non thread safe [Tue Oct 4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument [Tue Oct 4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument [Tue Oct 4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument [Tue Oct 4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument [Tue Oct 4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument [Tue Oct 4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument [Tue Oct 4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument [Tue Oct 4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument [Tue Oct 4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument it takes roughly 8000 requests from apachebench to cause it to do it normally, testing against apc.php PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Tue Oct 28 17:00:02 2025 UTC |
Suggested fix : --- apc_sem_orig.c 2007-11-25 01:10:21.495337000 -0800 +++ apc_sem.c 2007-11-29 17:26:04.448250000 -0800 @@ -82,12 +82,16 @@ } } - if ((semid = semget(key, 1, IPC_CREAT | IPC_EXCL | perms)) >= 0) { + if ((semid = semget(key, 2, IPC_CREAT | IPC_EXCL | perms)) >= 0) { /* sempahore created for the first time, initialize now */ arg.val = initval; if (semctl(semid, 0, SETVAL, arg) < 0) { apc_eprint("apc_sem_create: semctl(%d,...) failed:", semid); } + arg.val = getpid(); + if (semctl(semid, 1, SETVAL, arg) < 0) { + apc_eprint("apc_sem_create: semctl(%d,...) failed:", semid); + } } else if (errno == EEXIST) { /* sempahore already exists, don't initialize */ @@ -107,7 +111,10 @@ { /* we expect this call to fail often, so we do not check */ union semun arg; - semctl(semid, 0, IPC_RMID, arg); + int semPid = semctl(semid, 1, GETVAL, 0); + if (semPid == getpid()) { + semctl(semid, 0, IPC_RMID, arg); + } } void apc_sem_lock(int semid)Thinking more about why children can continue to exist and attempt to use sempahores which have been deleted and therefore throw errors..... the standard fcgi shutdown code in sapi/cgi/cgi_main.c void fastcgi_cleanup(int signal) { /* Kill all the processes in our process group */ kill(-pgroup, SIGTERM); exit(0); } Just sends signals to the children and then exits, This is bound to result in children surviving the parent by variable amounts of time. WITHOUT brasant's patches in apc: http://pecl.php.net/bugs/bug.php?id=5280 (ie this bug) and php/cgi_main.c http://bugs.php.net/bug.php?id=45423 The parent fcgi process will not call php_module_shutdown(), but all the children will and the first one to do so will call apc_sem_destroy() on each of the semaphores. The rest of them will return failure on semop(IPC_RMID) and apc_sem_destroy() ignores that by design as per comment). If the other children are still busy and have not yet handled the SIGTERM which has been sent to them by the parent they could still call apc_sem_lock() which will result in the error: apc_eprint("apc_sem_lock: semop(%d) failed:", semid); as far as I understand brasant's 2 patches are intended to ensure that: a) http://pecl.php.net/bugs/bug.php?id=5280 only the parent process removes semaphores (ie store pid of process which creates them in a second semaphore in each "set" and only let the process with the same pid IMP_RMID those sempahores b) http://bugs.php.net/bug.php?id=45423 ensure that the parent process actually runs php_module_shutdown() and doesn't just exit(0) in the middle of it's SIGTERM handler. This will mean that apc_sem_destroy() will run in the parent for each of the semaphore sets and the pid should match the pid at semaphore creation. (we actually have some trouble with the pids matching in our setup which I am still investigating, but this is the general idea). ie in summary. Parent creates semaphores and OLNLY the parent destroys them... Nice. BUT....... What we are finding is that permitting *only* the parent to destroy the semaphores is *not sufficient*. Because by the time the parent calls php_module_shutdown() it is *not* guaranteed that all the children have finished running and using semaphores. So on a busy concurrent mult-core server some of the children which are slower to die will throw errors because the semaphores are gone. The only "solution" we have found for this so far is by stopping *all* php-fcgi process deleting semaphores by commenting out the IPC_RMID line: void apc_sem_destroy(int semid) { /* we expect this call to fail often, so we do not check */ /* semctl(semid, 0, IPC_RMID, arg); */ } and then cleaning up the semaphores in the bash rc script which starts/stop the fcgi server: stop_postcmd() { rm -f ${pidfile} # first ensure that no fcgi processes are running because we will break them without semaphores while pgrep 'php-cgi' > /dev/null; do :; done; # list all semaphores owned by our user and remove them ipcs | awk "{ if (\$5 == \"${fcgiphp_user}\") print \$2}" | xargs -n1 ipcrm -s } Note that the while loop above *is required* because on a busy server after the parent dies the children may still be running. Only when all php-fcgi process are properly dead can we remove the sempahores. Without the while loop we get "apc_sem_lock: semop(1234) failed:" errors on shutdown. All of this was a bit of a surprise to us but we feel reasonably comfortable that this interpretation is correct. Clearly the "solution" using the rc script is not great/acceptable. However without a significant modification to php/cgi_main.c or some new concept we are not sure how else this can be solved. One such "new concept" might be yet another semaphore which signals whether any children are live and using semaphores. then the parent process could wait on this "children_alive" semaphore and only then remove all the sempahores incl this new one. Alternatively we could "detect if parent" and if so check for other processes in the same process group and wait for them to die before removing semaphores. The latter idea is very fcgi specfic and kind of moves the bash code above into apc_shutdown() Does all this make sense? Other, better ideas? OliverI think Oliver understood my intention correctly. However I don't understand what is the remaining critical issue. I understand that the warning messages are now misleading. So according to Oliver, child process still exists even though parent dies. If this is the case then php fastcgi should be patched, not APC. php fastcgi framework should kill all child process after some timeout before parent dies. Assuming that we don't have this fix now in php then 1. It is not harmful to cleanup semaphore by parent because process is anyway dieing and warning is harmless. 2. In APC, if we have a track of "is_shutting_down" flag then we can implement something like : if (semop failed and if (is_shutting_down )) then don't show up the message. unless verbose mode is enabled. 3. In stop_postcmd(), we have a line : while pgrep 'php-cgi' > /dev/null; do :; done; The fix assumes that there is single php-cgi server is running. If somebody is running multiple lighttpd/apache servers then all of those will be killed. Virtual hosting site might be the real world scenario. My opinion is that we should fix this issue in php fastcgi, not in apc. Fastcgi should kill all childs (and make sure all children are killed by SIGKILL after some timeout) before parent dies. Meanwhile, apc can adopt solution 2.