php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #58459 [apc-error] Stuck spinlock (0xa7791fc0) detected
Submitted: 2008-12-15 22:56 UTC Modified: 2009-02-28 16:31 UTC
From: paulgao at yeah dot net Assigned:
Status: Closed Package: APC (PECL)
PHP Version: 5.2.5 OS: Slackware
Private report: No CVE-ID: None
 [2008-12-15 22:56 UTC] paulgao at yeah dot net
Description:
------------
my php.ini
[apc]
apc.enable_cli            = 1
apc.include_once_override = Off
apc.shm_size              = 256

Actual result:
--------------
Dec 16 11:52:18.333062 [WARNING] fpm_stdio_child_said(), line 167: child 1676 (pool default) said into stderr: "[Tue Dec 16 11:52:18 2008] [apc-error] Stuck spinlock (0xa7791fc0) detected in /home/codebase/framework/library/framework.class.php on line 7."

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-12-16 01:46 UTC] paulgao at yeah dot net
I use File-Lock :-(

Dec 16 14:39:44.245282 [WARNING] fpm_stdio_child_said(), line 167: child 13322 (pool default) said into stderr: "[Tue Dec 16 14:39:44 2008] [apc-error] apc_fcntl_lock failed: Resource deadlock avoided in /home/codebase/framework/library/framework.class.php on line 7."
 [2008-12-16 02:02 UTC] shire@php.net
Do you have any errors (segfaults, dying processes, etc) before this happens?  

Are you able to narrow this down to section of code that can reproduce this behavior?

Which version of APC are you using? (can you try a previous version if you're on the latest, or the latest beta release?)
 [2008-12-16 03:08 UTC] paulgao at yeah dot net
have segfaults message:

Dec 14 23:11:25 IImobile-SV25-B41 kernel: php-cgi[4692]: segfault at 00000000 eip 08281b3b esp bfa5d330 error 4
Dec 15 15:01:17 IImobile-SV25-B41 kernel: php-cgi[9224]: segfault at 00000000 eip 08281b3b esp bfa5d330 error 4
Dec 15 15:40:55 IImobile-SV25-B41 kernel: php-cgi[9394] general protection eip:b7c7132c esp:bfa5def8 error:0
Dec 16 10:47:56 IImobile-SV25-B41 kernel: php-cgi[1577]: segfault at 00000001 eip b7ca6155 esp bff013b0 error 4
Dec 16 10:48:00 IImobile-SV25-B41 kernel: php-cgi[1584]: segfault at 00000001 eip b7ca6155 esp bff013b0 error 4
Dec 16 12:33:35 IImobile-SV25-B41 kernel: php-cgi[19549]: segfault at b7bad418 eip 082ca061 esp bfc5e430 error 7

I now use APC 3.1.2, but APC 3.0.19 has the same problem.

but the problem only appears on 32bit systems(Slackware 12.1). 64bit systems(Centos 5.2 64-bit version) is OK.

"Are you able to narrow this down to section of code that can reproduce this behavior?"
- Code is very simple, only use apc_fetch & apc_store functions. but system load is very heavy.

When i use apc.php watch the apc runing status, "Cached Files" value frequently become 0, and "Cache full count" value grows very quickly. This behavior is strange compared to stable running systems.
 [2008-12-16 03:25 UTC] shire@php.net
This is a sign that you are exceeding your cache size.  If you have more memory, attempt to increase your cache size.  Alternatively you could exclude some files using the filters INI configuration option.  When the cache is full it will expunge, if your system is under high load this can cause problems.
 [2008-12-16 03:55 UTC] paulgao at yeah dot net
>> This is a sign that you are exceeding your cache size.

But there was still a lot of free memory when that problem occurred, according to what apc.php reported.

>> Alternatively you could
>> exclude some files using the filters INI configuration >> option.  When the
>> cache is full it will expunge, if your system is under >> high load this
>> can cause problems.

The file cache in overall cache is quite small(about ~60M on the stable running system), 
so exceeding the cache size is almost imposibile.

On 64-bit systems, apc.php reported file cache full count always is 0.

while the file cache status seemed strange, the user cache status was normal.

and I think increasing the cache size will only delay the occurrance of the problem, but the problem will remain.

please tell me if there's any infomation that I can provide to determine this issue. 

thank you!
 [2008-12-17 00:32 UTC] paulgao at yeah dot net
I think I should give you more background infomation about our system to help you think about this issue.

Currently we have four servers running 64-bit CentOS(version 5.2) , and another four servers running 32-bit Slackware(version 12.1).

All code are equal on these servers, synchronized by Rsync.

We compiled PHP 5.2.8 with Suhosin and APC statically. We use PHP-FPM to manage the PHP-CGI process.

PHP-FPM, Suhosin and APC have been updated to the latest released version.

Our website receives over 10 million dynamic requests per day.

The first time when our site went online, all the servers(32-bit and 64-bit) would crash. At that time APC 3.0.19 was used, but we found out that the crash was due to a bug of PHP 5.2.6. After upgrading to PHP 5.2.8, the problem(frequent crash under high load) only occurs on our 32-bit servers.

On those 32-bit servers, the php-cgi process runs smoothly when the system load is normal, but crashes when the load goes high.

Currently we have to give those 32-bit servers much lower weight on our load-balancing system to prevent such crashes.

on those 32-bit servers before such a crash, according to what apc.php reported, , there's still free memory when the crash happens, but the file cache status is strange, the user cache status is normal(I can give some screenshots if needed).

On those 64-bit servers, this problem happens occasionally, with the same strange status: the file cache fill count is always 0.

After some comparisons between our 32-bit and 64-bit servers, I think personally this issue may have something with APC's lock mechanism, or there're some defects in cleaning APC cache when the cache is full under high load.

I can provide more infomation(debug, logs, etc...) if needed, and I would very like to work in with you to find out the source of this annoying issue.

To make APC more stable is our mutual goal. :-)

btw: for now, cache size is 384M, but it seems the 32-bit server crashes more frequently. :(
 [2008-12-17 01:13 UTC] gopalv82 at yahoo dot com
a gdb backtrace won't hurt - for the segvs.

I don't do any testing with suhosin on, but at least it would be nice to know where the segv is happening.
 [2008-12-17 01:38 UTC] rasmus@php.net
And without Suhosin?  This code has never been tested with Suhosin.
 [2008-12-17 03:29 UTC] paulgao at yeah dot net
I will try to provide more infomation(e.g. backtrace) later, but why is that this problem only happens on the 32-bit servers? With the same code and settings, our 64-bit servers rarely crash, although those 64-bit servers do crash with the same strange status sometime.
 [2008-12-17 05:03 UTC] gopalv82 at yahoo dot com
Try to get me a backtrace from 3.1.2 builds, the 3.0.x has been dead for six months (for me).
 [2008-12-17 22:44 UTC] paulgao at yeah dot net
Now I've enabled PHP's debug mode, and the "Stuck spinlock" error temporarily disappears, even after I limit the cache size to 32M.

But I got another two "core dump" errors.

The first error is quit simple:

(gdb) bt
#0 0x08364b55 in _zend_is_inconsistent (ht=Cannot access memory at address 0xbfb3ab90
) at /root/php-5.2.8/Zend/zend_hash.c:54
Cannot access memory at address 0xbfb3ab8c

The second is more complex:

(gdb) bt
#0 0x080d22f7 in apc_copy_op_array (dst=0x0, src=0x898a844, ctxt=0xbfb77f30) at /root/php-5.2.8/ext/apc/apc_compile.c:881
#1 0x080d473e in my_compile_file (h=0xbfb77fe8, type=8) at /root/php-5.2.8/ext/apc/apc_main.c:431
#2 0x08388d4a in ZEND_INCLUDE_OR_EVAL_SPEC_TMP_HANDLER (execute_data=0xbfb781b8) at /root/php-5.2.8/Zend/zend_vm_execute.h:4616
#3 0x0837cf9d in execute (op_array=0x8a2e428) at /root/php-5.2.8/Zend/zend_vm_execute.h:92
#4 0x082af79b in suhosin_execute_ex (op_array=0x8a2e428, zo=0, dummy=0) at /root/php-5.2.8/ext/suhosin/execute.c:562
#5 0x082af7c8 in suhosin_execute (op_array=0x8a2e428) at /root/php-5.2.8/ext/suhosin/execute.c:574
#6 0x0837d512 in zend_do_fcall_common_helper_SPEC (execute_data=0xbfb78568) at /root/php-5.2.8/Zend/zend_vm_execute.h:234
#7 0x083826c6 in ZEND_DO_FCALL_SPEC_CONST_HANDLER (execute_data=0xbfb78568) at /root/php-5.2.8/Zend/zend_vm_execute.h:1729
#8 0x0837cf9d in execute (op_array=0x8893ad8) at /root/php-5.2.8/Zend/zend_vm_execute.h:92
#9 0x082af79b in suhosin_execute_ex (op_array=0x8893ad8, zo=0, dummy=0) at /root/php-5.2.8/ext/suhosin/execute.c:562
#10 0x082af7c8 in suhosin_execute (op_array=0x8893ad8) at /root/php-5.2.8/ext/suhosin/execute.c:574
#11 0x0837d512 in zend_do_fcall_common_helper_SPEC (execute_data=0xbfb7bd08) at /root/php-5.2.8/Zend/zend_vm_execute.h:234
#12 0x0837dea5 in ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER (execute_data=0xbfb7bd08) at /root/php-5.2.8/Zend/zend_vm_execute.h:322
#13 0x0837cf9d in execute (op_array=0x88a18d4) at /root/php-5.2.8/Zend/zend_vm_execute.h:92
#14 0x082af79b in suhosin_execute_ex (op_array=0x88a18d4, zo=0, dummy=0) at /root/php-5.2.8/ext/suhosin/execute.c:562
#15 0x082af7c8 in suhosin_execute (op_array=0x88a18d4) at /root/php-5.2.8/ext/suhosin/execute.c:574
#16 0x0835c29e in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /root/php-5.2.8/Zend/zend.c:1215
#17 0x0831287b in php_execute_script (primary_file=0xbfb80100) at /root/php-5.2.8/main/main.c:2044
#18 0x083d0fa5 in main (argc=4, argv=0xbfb80254) at /root/php-5.2.8/sapi/cgi/cgi_main.c:2118

I hope these backtraces could help analyze the problem. I would disable the suhosin extension and try again later.
 [2008-12-18 03:02 UTC] gopalv82 at yahoo dot com
881 CHECK(dst = (zend_op_array*) apc_pool_alloc(pool, sizeof(src[0])));

could you check values of pool, pool->palloc, pool->pfree ?

Because according to what I think, that segv is because you ran out of memory & the pool creation failed.
 [2008-12-18 05:33 UTC] paulgao at yeah dot net
I think that what this backtrace implies can explain one of my previous mentioned problem. 

That is, the file cache full count would suddenly grow high when the cache is full.

I will compile the latest code from CVS and give feedbacks of the test result soon.

Thank you all for the concern of this problem.
 [2008-12-22 22:54 UTC] paulgao at yeah dot net
The "Stuck spinlock" error occurred again after some days of running, but unfortunately, it seems no "core dump" log was found.

We've already compiled PHP using the latest CVS source code. 

Could anyone give me any tips or hints?
 [2009-02-16 19:41 UTC] shire@php.net
Is your backtrace from 3.1.2?  

Did you test without Suhosin as requested?  

Is your system low on free memory?  (I've ran into this error on systems that are under heavy memory utilization or are using making significant use of /tmp, do you have any "Bus signal" errors in your log?).
 [2009-02-17 02:41 UTC] paulgao at yeah dot net
1. we are currently using an CVS version of APC which has no memory-protect or binary file features(a snapshot of January, 2009). Under certain load, the "stuck spinlock" erro would appear on the latest CVS version of APC, but not on our current version.

2. The "stuck spinlock" error is not an fatal error, so there's no coredump. It just occurs when a timeout is reached. and I don't see any "Bus Signal" errors.

3. once there was an improvement on memory fragmentation, but it seems that the improvement disappears from the latest CVS version(the fragmentation increases greatly after a period of running).

4. The "stuck spinlock" error remains after we remove the suhosin extension.
 [2009-02-17 03:28 UTC] shire@php.net
How many of these errors do you typically see?  Is the overall system performance poor when this happens?  Does it seem to be correlated to fragmentation (this shouldn't be significant performance loss in recent versions).

It's possible (I've seen this on my systems) that this is related to an overall system slowdown rather than something wrong with APC.  The spin-locks are set to time out after a certain time and could do so simply because of a process being starved of CPU. However if you are certain you aren't seeing this in the latest version perhaps there's a performance regression somewhere.  

It might be interesting to know how your system behaves with pthread locking enabled rather than spin locks.  These behave differently and won't time out.
 [2009-02-18 22:57 UTC] shire@php.net
I just made a couple commits that fix segfaults and dead-locks, can you please try latest CVS and report if it fixes anything for you?
 [2009-02-19 05:46 UTC] paulgao at yeah dot net
Thank you, shire.

I've deployed the latest CVS code of APC to our production server, and it seems more stable now. 

I will give feedbacks if any error happens.

Moreover, I have the following questions:

1. After benchmarking with ab, we find a little bit of performance loss with latest APC deployed, so could APC be made even faster?
2. Could anyone give attention to Bug #15356(http://pecl.php.net/bugs/bug.php?id=15356) I've ever reported? I'd hope this annoying bug could be fixed soon.
 [2009-02-19 14:31 UTC] shire@php.net
Great, please let me know if your error comes back, we could have other issues lurking.

It's possible we had a perf. regression somewhere in the code, can you open another bug (just to keep this one simple) and can you provide details to what you where benchmarking (just the file cache ie: loading pages, storing values, fetching values?)  Also note between what two specific versions you saw this.

I'll update the other bug you mentioned.
 [2009-02-28 16:31 UTC] shire@php.net
Closing this bug for now as it sounds like this is working for you, if you see this again please let us know.
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Wed May 25 17:04:04 2022 UTC