php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #56521 semop(655371) failed: Invalid argument
Submitted: 2005-09-02 11:22 UTC Modified: 2008-11-17 14:24 UTC
From: gamr at gamrdev dot com Assigned:
Status: Closed Package: APC (PECL)
PHP Version: 5_1 CVS-2005-09-02 (dev) OS: Gentoo Linux
Private report: No CVE-ID:
 [2005-09-02 11:22 UTC] gamr at gamrdev dot com
Description:
------------
current apc and php from cvs, non thread safe

[Tue Oct  4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument
[Tue Oct  4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument
[Tue Oct  4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument
[Tue Oct  4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument
[Tue Oct  4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument
[Tue Oct  4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument
[Tue Oct  4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument
[Tue Oct  4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument
[Tue Oct  4 01:40:40 2005] [apc-error] apc_sem_lock: semop(655371) failed: Invalid argument


it takes roughly 8000 requests from apachebench to cause it to do it normally, testing against apc.php


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2005-09-02 11:30 UTC] rasmus@php.net
Stick with fcntl locking for now.  eg. these configure flags:

./configure --enable-apc' --enable-apc-mmap --with-apxs 

And a php.ini block similar to this:

extension=apc.so
apc.enabled=1
apc.shm_segments=1
apc.optimization=0
apc.shm_size=64
apc.ttl=7200
apc.user_ttl=7200
apc.num_files_hint=500
apc.mmap_file_mask=/tmp/apc.XXXXXX
apc.enable_cli=1
 [2006-02-27 07:33 UTC] mike@php.net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.


 [2007-11-26 02:02 UTC] basant dot kukreja at sun dot com
I also observed this error :

Platform : Solaris 10 
apache : httpd-2.2.6
php : 5.2.4
apc : 3.0.14


For simplicity, I set the following in httpd-mpm.conf

StartServers 1
MinSpareServers 1
MaxSpareServers 1
MaxClients 1
ServerLimit 1
# MaxClients 150
MaxRequestsPerChild 2

With the above configuration apache will create only single worker process.
Worker process will die after 2 request and apache will start a new process. 

I create a simple test.php
<?
print "hello world";
<?

I send 3 requests "/test.php". I put few printfs in apc code. 

Here is the log file input (second [] contains the pid):

[Sun Nov 25 00:15:29 2007] [warn] Init: Session Cache is not configured [hint: SSLSessionCache]
[Sun Nov 25 00:15:39 2007] [1772] [apc-debug] apc_sem_create: semget returned 16777301 initval=1
[Sun Nov 25 00:15:49 2007] [1772] [apc-debug] apc_sem_create: semget returned 16777302 initval=1
[Sun Nov 25 00:15:59 2007] [1772] [apc-debug] apc_sem_create: semget returned 16777303 initval=1
[Sun Nov 25 00:15:59 2007] [notice] Digest: generating secret for digest authentication ...
[Sun Nov 25 00:15:59 2007] [notice] Digest: done
[Sun Nov 25 00:16:00 2007] [notice] Apache/2.2.6 (Unix) PHP/5.2.4 mod_ssl/2.2.6 OpenSSL/0.9.7d configured -- resuming normal operations
[Sun Nov 25 00:16:09 2007] [1777] [apc-debug] apc_sem_destroy : Destroying semaphore 16777302
[Sun Nov 25 00:16:09 2007] [1777] [apc-debug] apc_sem_destroy : Destroying semaphore 16777303
[Sun Nov 25 00:16:09 2007] [1777] [apc-debug] apc_sem_destroy : Destroying semaphore 16777301
[Sun Nov 25 00:16:10 2007] [error] server reached MaxClients setting, consider raising the MaxClients setting
[Sun Nov 25 00:16:26 2007] [1780] [apc-error] apc_sem_lock: semop(16777302) failed: errno = 22 olderrno=10 retval=-1

What really happen is that apache parent process create the semaphore and
child process inherit the semaphore ids. When child process exits, it was
destroying the semaphore. When apache starts a new process, new process uses
the semid which as been deleted and hence semop failed.

More information is there at :
http://forum.java.sun.com/thread.jspa?threadID=5237815&start=15&tstart=0
 [2007-11-29 18:35 UTC] basant dot kukreja at sun dot com
Actually this is a change in php version why this issue appeared in later
version of php.  php-5.2.4 registers shutdown hook in apache2
"php_apache_child_shutdown".  php_apache_child_shutdown tries to call shutdown
for all of it's extension.  It eventually calls apc_module_shutdown.
apc_module_shutdown should not cleanup the cache. I don't know what should be
the fix but apc_module_shutdown probably should not be called for every
prefork child apache process.

Here is the call stack of apc_sem_destroy :
=>[1] apc_sem_destroy(semid = 16777329) (optimized), at 0xfd7d1bbc (line ~114) in "zend_alloc.h"
  [2] apc_cache_destroy(cache = ???) (optimized), at 0xfd7bde88 (line ~338) in "apc_cache.c"
  [3] apc_module_shutdown() (optimized), at 0xfd7ce23c (line ~566) in "apc_main.c"
  [4] zm_shutdown_apc(type = ???, module_number = ???) (optimized), at 0xfd7b92f8 (line ~251) in "php_apc.c"
  [5] module_destructor(0x1e3088, 0x1dd218, 0x25, 0xfec6fad4, 0x0, 0x7c942979), at 0xfdfd5408
  [6] zend_hash_graceful_reverse_destroy(0xfe2ff12c, 0x0, 0x1e3058, 0x0, 0xe4, 0xfe2ff210), at 0xfdfdbea4
  [7] zend_shutdown(0x4fc, 0xfe27acbc, 0x400, 0xfe300420, 0x30c, 0x0), at 0xfdfcc738
  [8] php_module_shutdown(0xfe2ad144, 0x32488, 0x32480, 0xfe27acbc, 0x32400, 0x32400), at 0xfdf73100
  [9] php_module_shutdown_wrapper(0xfe2baf30, 0x0, 0x10ca8, 0xfec32ea0, 0xfed5861c, 0x23d5d0), at 0xfdf7308c
  [10] php_apache_child_shutdown(0x0, 0x293390, 0xfdf73088, 0xfe27acbc, 0x40274, 0x40000), at 0xfe03d710
  [11] apr_pool_destroy(0x293318, 0x293380, 0x297310, 0x0, 0x3, 0x0), at 0xfef1632c
  [12] child_main(0x0, 0x2955f0, 0x68ae4, 0x7fe70, 0x7d33c, 0x1), at 0x52748
  [13] make_child(0x52000, 0x0, 0x0, 0x0, 0x1, 0x7d000), at 0x52ca0
  [14] perform_idle_server_maintenance(0x7d000, 0x0, 0x2, 0x7d32c, 0x802bc, 0x1), at 0x52f44
  [15] ap_mpm_run(0x7d000, 0x0, 0x0, 0x0, 0x7d344, 0xffbf9014), at 0x53750
  [16] main(0x88ce8, 0xffbff100, 0x7d1bc, 0x7c128, 0x618fc, 0x7c118), at 0x27a6c
 [2007-11-29 20:45 UTC] basant dot kukreja at sun dot com
Suggested fix :
--- apc_sem_orig.c      2007-11-25 01:10:21.495337000 -0800
+++ apc_sem.c   2007-11-29 17:26:04.448250000 -0800
@@ -82,12 +82,16 @@
         }
     }

-    if ((semid = semget(key, 1, IPC_CREAT | IPC_EXCL | perms)) >= 0) {
+    if ((semid = semget(key, 2, IPC_CREAT | IPC_EXCL | perms)) >= 0) {
         /* sempahore created for the first time, initialize now */
         arg.val = initval;
         if (semctl(semid, 0, SETVAL, arg) < 0) {
             apc_eprint("apc_sem_create: semctl(%d,...) failed:", semid);
         }
+        arg.val = getpid();
+        if (semctl(semid, 1, SETVAL, arg) < 0) {
+            apc_eprint("apc_sem_create: semctl(%d,...) failed:", semid);
+        }
     }
     else if (errno == EEXIST) {
         /* sempahore already exists, don't initialize */
@@ -107,7 +111,10 @@
 {
     /* we expect this call to fail often, so we do not check */
     union semun arg;
-    semctl(semid, 0, IPC_RMID, arg);
+    int semPid = semctl(semid, 1, GETVAL, 0);
+    if (semPid == getpid()) {
+        semctl(semid, 0, IPC_RMID, arg);
+    }
 }

 void apc_sem_lock(int semid)
 [2008-01-05 03:13 UTC] vic_soft at yahoo dot com
I had the same problem with version 3.0.16. After 10 to 20 minutes, the server (apache) no longer interprets php files, just prompts for download. When this happens, in log appears a large block of lines like:

[apc-error] apc_sem_lock: semop(4128783) failed: Identifier removed

folowed by another one with:

[apc-error] apc_sem_lock: semop(4128783) failed: Invalid argument.

The patch suggested by basant dot kukreja at sun dot com fixed the problem.
 [2008-08-08 15:18 UTC] basant dot kukreja at sun dot com
If you are using fastcgi then you need another php patch :
http://bugs.php.net/bug.php?id=45423

This patch has been merged into cvs.
 [2008-11-11 02:36 UTC] oliver at realtsp dot com
We found this this patch prevents proper cleanup of
semaphores under our configuration, see discussion here:

http://pecl.php.net/bugs/bug.php?id=14957

We are not using apache, using lighty with fastcgi, it turns
out that the patch suggested in this bug does not seems to
be required for this setup. We were automatically applying
the patch because it is part of the FreeBSD ports tree.
Removing the patch from the FreeBSD ports tree when building
the port removed the semaphore cleanup problem for us.

Beware! You may be better off without this patch! I going to
ask the FreeBSD ports maintainer to consider removing this
patch from the APC port, as it not required in all cases and
has negative side-effects.
 [2008-11-17 13:50 UTC] basant dot kukreja at sun dot com
As oliver told, I also noticed the issue with lighttpd
that's why I submitted another php patch
http://bugs.php.net/bug.php?id=45423

If you are running php 5.2.6 then with lighttpd this patch
will work fine. Look at opensolaris sources which have both
lighttpd and apache and php fastcgi works fine with both.

If you are running php < 5.2.6 then you have to apply patch
mentioned in 
http://bugs.php.net/bug.php?id=45423
 [2008-11-17 14:24 UTC] shire@php.net
sounds like this has been checked into all the necessary cvs locations.  If you are still having problems please re-open.  Thanks!
 [2008-11-26 11:30 UTC] oliver at realtsp dot com
sorry, but we need to re-open this. it is still a problem in
APC-CVS as of 2008-11-26.  without using the patch provided
by basant kukreja

background we are running php5.2.6, lighttpd 1.4.19 and APC
3.0.19 and APC-CVS to compare. The fastcgi server is not
spawn by lighty but by an rc script from the command line,
ie it is completely separate to lighty and communicates via
a socket.

we are getting these apc_sem_lock: semop(655371)
failed: Invalid argument errors in 2 situations:

1) during shutdown. 
2) after segfaults of some php-fcgi process.

the first is easier to pin down.

It appears that during shutdown, because the current code
(without basant's patch) calls apc_sem_destroy() in every
fcgi child process that it is possible that some other
child, which hasn't finished dying yet and is still trying
to access cached code in APC, calls apc_sem_lock() and fails
because the semaphore is no longer available.

NOTE!! You have to hit the cache which quite some
concurrency to get this error...an idle server with one CPU
will not suffice. we used 15 concurrent siege users each
loading 360 php code files on every request to reproduce
this. And we are using an 8CPU-core machine which makes the
usage truely concurrent. When you have sufficient load,
simply shutting down the fastcgi by sending it a SIGTERM is
sufficient to get the error.

basant's patch (which has some other issues with it in our
setup that I am still investigating) doesn't quite address
this problem, because it simply tries to restrict the
removing of sempahores to the parent process. From our
testing it seems that just because the parent is calling
php_module_shutdown() does not guarantee that all its
children are already dead. 

In fact because we have traditionally had problems with
semaphores being cleaned up at all we were doing it in bash
after php-fcgi-parent shutdown finishes..even with semaphore
removal at this late stage we can reproduce the problem..the
children are not fully dead and are still trying to use
semaphores which no longer exist.

This needs some discussion as to when the cleanup of
semaphores should really happen... continue with this bug or
open a new one?
 [2008-11-27 06:16 UTC] oliver at realtsp dot com
Thinking more about why children can continue to exist and
attempt to use sempahores which have been deleted and
therefore throw errors.....

the standard fcgi shutdown code in
sapi/cgi/cgi_main.c

void fastcgi_cleanup(int signal)
{
    /* Kill all the processes in our process group */
    kill(-pgroup, SIGTERM);
    exit(0);
}

Just sends signals to the children and then exits, This is
bound to result in children surviving the parent by variable
amounts of time. WITHOUT brasant's patches 

in apc:
http://pecl.php.net/bugs/bug.php?id=5280 (ie this bug)

and php/cgi_main.c
http://bugs.php.net/bug.php?id=45423

The parent fcgi process will not call php_module_shutdown(),
but all the children will and the first one to do so will
call apc_sem_destroy() on each of the semaphores. The rest
of them will return failure on semop(IPC_RMID) and
apc_sem_destroy() ignores that by design as per comment). If
the other children are still busy and have not yet handled
the SIGTERM which has been sent to them by the parent they
could still call apc_sem_lock() which will result in the
error:
            apc_eprint("apc_sem_lock: semop(%d) failed:",
semid);


as far as I understand brasant's 2 patches are intended to
ensure that:
a) http://pecl.php.net/bugs/bug.php?id=5280
only the parent process removes semaphores (ie store pid of
process which creates them in a second semaphore in each
"set" and only let the process with the same pid IMP_RMID
those sempahores

b) http://bugs.php.net/bug.php?id=45423
ensure that the parent process actually runs
php_module_shutdown() and doesn't just exit(0) in the middle
of it's SIGTERM handler. This will mean that
apc_sem_destroy() will run in the parent for each of the
semaphore sets and the pid should match the pid at semaphore
creation. (we actually have some trouble with the pids
matching in our setup which I am still investigating, but
this is the general idea).

ie in summary. Parent creates semaphores and OLNLY the
parent destroys them...

Nice. BUT.......

What we are finding is that permitting *only* the parent to
destroy the semaphores is *not sufficient*. Because by the
time the parent calls php_module_shutdown() it is *not*
guaranteed that all the children have finished running and
using semaphores. So on a busy concurrent mult-core server
some of the children which are slower to die will throw
errors because the semaphores are gone.

The only "solution" we have found for this so far is by
stopping *all* php-fcgi process deleting semaphores by
commenting out the IPC_RMID line:
void apc_sem_destroy(int semid)
{
    /* we expect this call to fail often, so we do not check
*/
   /* semctl(semid, 0, IPC_RMID, arg); */
}

and then cleaning up the semaphores in the bash rc script
which starts/stop the fcgi server:

stop_postcmd()
{
    rm -f ${pidfile}

    # first ensure that no fcgi processes are running
because we will break them without semaphores
    while pgrep 'php-cgi' > /dev/null; do :; done;
    # list all semaphores owned by our user and remove them
    ipcs | awk "{ if (\$5 == \"${fcgiphp_user}\") print
\$2}" | xargs -n1 ipcrm -s
}

Note that the while loop above *is required* because on a
busy server after the parent dies the children may still be
running. Only when all php-fcgi process are properly dead
can we remove the sempahores. Without the while loop we get
"apc_sem_lock: semop(1234) failed:" errors on shutdown.

All of this was a bit of a surprise to us but we feel
reasonably comfortable that this interpretation  is correct.
Clearly the "solution" using the rc script is not
great/acceptable. However without a significant modification
to php/cgi_main.c or some new concept we are not sure how
else this can be solved.

One such "new concept" might be yet another semaphore which
signals whether any children are live and using semaphores.
then the parent process could wait on this "children_alive"
semaphore and only then remove all the sempahores incl this
new one. 

Alternatively we could "detect if parent" and if so check
for other processes in the same process group and wait for
them to die before removing semaphores. 

The latter idea is very fcgi specfic and kind of moves the
bash code above into apc_shutdown()

Does all this make sense? Other, better ideas?

Oliver
 [2008-11-27 15:50 UTC] basant dot kukreja at gmail dot com
I think Oliver understood my intention correctly.

However I don't understand what is the remaining critical issue.  I understand
that the warning messages are now misleading. So according to Oliver, child
process still exists even though parent dies. If this is the case then php
fastcgi should be patched, not APC. php fastcgi framework should kill all
child process after some timeout before parent dies.


Assuming that we don't have this fix now in php then
1. It is not harmful to cleanup semaphore by parent because process
is anyway dieing and warning is harmless.
2. In APC, if we have a track of "is_shutting_down" flag then we can implement
  something like :
  if (semop failed and if (is_shutting_down )) then don't show up the message.
   unless verbose mode is enabled.
3. In stop_postcmd(), we have a line :
    while pgrep 'php-cgi' > /dev/null; do :; done;
The fix assumes that there is single php-cgi server is running. If somebody is
running multiple lighttpd/apache servers then all of those will be killed.
Virtual hosting site might be the real world scenario.

My opinion is that we should fix this issue in php fastcgi, not in apc.
Fastcgi should kill all childs (and make sure all children are killed by
SIGKILL after some timeout) before parent dies.

Meanwhile, apc can adopt solution 2.
 
PHP Copyright © 2001-2014 The PHP Group
All rights reserved.
Last updated: Sun Apr 20 19:01:51 2014 UTC