php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #57616 ttl/user ttl > 0 causes locking problems
Submitted: 2007-04-08 16:25 UTC Modified: 2009-05-05 14:48 UTC
From: wnxwhitewolf at warriornation dot net Assigned:
Status: No Feedback Package: APC (PECL)
PHP Version: 5.1.6 OS: Linux
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2007-04-08 16:25 UTC] wnxwhitewolf at warriornation dot net
Description:
------------
I'm not sure if this bug is a duplicate of a previous APC bug, but I'll explain my findings.

Several conditions must be met:
1) Script TTL (and User TTL if cache is used for user entries) > 0 (or non-fcntl locks used)
2) Any locking mechanism (tested with fnctl and sem locks and spinlocks on two 2.6 kernel machines on both php 5.1.6 and 5.2.0)
3) The cache must be filled

Then, once the above conditions are met APC locks the cache and attempts to clear it out but fails. Please note however, that when using any other type of locking than fcntl a TTL of 0 does /not/ fix the issue, although if the default fcntl locks are used a TTL of 0 will clear the cache once it's full as expected.

Errors like the following then appear in the error_log of apache once the apache thread that APC is using to clear out the cache times out (or fails):
[Sat Apr  7 23:00:31 2007] [apc-error] Stuck spinlock (0xb41c0000) detected (spinlocks)
[Thu Apr  5 00:31:55 2007] [apc-error] apc_fcntl_lock failed: Bad file descriptor (fcntl)


Software versions (also on another server with php5.2 and apache 2.0.49, both are apache 2):
PHP 5.1.6 (cli) (built: Mar  7 2007 12:48:25)
Copyright (c) 1997-2006 The PHP Group
Zend Engine v2.1.0, Copyright (c) 1998-2006 Zend Technologies

apache2 -V
Server version: Apache/2.0.55
Server built:   Sep 27 2006 16:52:14
Server's Module Magic Number: 20020903:11
Architecture:   32-bit

I'll watch the ticket and provide more information if needed. (If you want me to gdb, give me specific instructions on exactly how you want it)


Reproduce code:
---------------
<?php
$testfiledata = file_get_contents('testfile'); //some file less than the max value for apc storage
for($i=0;$i<96;$i++) { //some value that will fill the cache, but I think at least one file needs to be cached for this bug to reproduce
apc_store('test'.uniqid(), $testfiledata);
}
?>

Expected result:
----------------
I expect APC to loop through the cache and empty out the cache of all expired scripts, and if that fails and the data can't be added, apc_store() should fail.

Actual result:
--------------
APC enters into an infinite loop (eventually dies, apache2 timeout?) and creates the issue defined in the description.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2007-04-09 05:33 UTC] gopalv82 at yahoo dot com
The lock cleanup dying while holding a lock is a known problem for apache2 sapi. The apache2 sapi does not do anything for blocking signals even when HANDLE_BLOCK_INTERRUPTIONS is used.

http://news.php.net/php.internals/17172

I've known about this issue for a while now:

http://t3.dotgnu.info/blog/php/long-lost-todo.html

But I can't relicense the code I used to fix it initially. And rewriting a nearly identical bit of code under php license will put me in a slightly grey area in terms of copyright.
 [2007-04-15 22:17 UTC] scottmac@php.net
I assume we saw this too a few minutes ago, first time its happened in a while.

We were using 5.2.2-dev and APC from CVS.

I grabbed a backtrace and it looked as follows:

#0  0x008b07a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
No symbol table info available.
#1  0x00989a41 in ___newselect_nocancel () from /lib/tls/libc.so.6
No symbol table info available.
#2  0x0028e80d in apc_spin_pg_usleep (microsec=401000) at /www/src/APC-cvs/pgsql_s_lock.c:114
        delay = {tv_sec = 0, tv_usec = 21000}
#3  0x0028e888 in apc_spin_s_lock (lock=0xafcc402c "\001\001", file=0x291733 "/www/src/APC-cvs/apc_spin.c", line=38) at /www/src/APC-cvs/pgsql_s_lock.c:213
        spins = Variable "spins" is not available.

I then checked the apache error logs and it was full of:
[Sun Apr 15 21:04:03 2007] [apc-error] Stuck spinlock (0xafcc402c) detected
 [2007-05-11 21:09 UTC] sean at ertw dot com
I just tried moving from eAccelerator to APC.  After a little while the cache filled up and I started getting:

[Fri May 11 11:43:33 2007] [apc-error] apc_fcntl_rdlock failed: Bad file descriptor

(but nothing about spinlocks)

The only thing I tried was adding apc.mmap_file_mask, previously I didn't have it (config below)

I'm just doing script caching, no user variable caching.

APC 3.0.14
PHP 5.1.6
Apache 2.0.52

APC config:
extension=apc.so
apc.mmap_file_mask="/tmp/apcphp5.XXXXXX"
apc.file_update_protection="2"
apc.localcache=1
apc.num_files_hint=6000
apc.shm_size=1024
apc.ttl=300
apc.write_lock=1
 [2007-05-13 15:00 UTC] gopalv82 at yahoo dot com
@sean: Try turning off localcache. And I'm slightly scared of you if you managed to fill up a gig of cache :)
 [2007-11-12 09:22 UTC] cboitel at gmail dot com
I have the exact same issue with PHP 5.2.4 and APC 3.0.14

While investigating APC source, i found issue in apc_cache_expunge which is still left in current CVS version.

If cache ttl is set, expunge will walk through the cache entries to identify the ones to be cleared (i.e thoses having ttl reached) but it will loop for ever. After 30s (PHP's max_execution_time), request causing the expunge is cancelled but lock is left and no further PHP requests can be served.

Code extract with line numbers:
    413     for (i = 0; i < cache->num_slots; i++) {
    414             p = &cache->slots[i];
    415             while(*p) {
    416                 /*
    417                  * For the user cache we look at the individual entry ttl values
    418                  * and if not set fall back to the default ttl for the user cache
    419                  */
    420                 if((*p)->value->type == APC_CACHE_ENTRY_USER) {
    421                     if((*p)->value->data.user.ttl) {
    422                         if((*p)->creation_time + (*p)->value->data.user.ttl < t) {
    423                             remove_slot(cache, p);
    424                             continue;
    425                         }
    426                     } else if(cache->ttl) {
    427                         if((*p)->creation_time + cache->ttl < t) {
    428                             remove_slot(cache, p);
    429                             continue;
    430                         }
    431                     }
    432                 } else if((*p)->access_time < (t - cache->ttl)) {
    433                     remove_slot(cache, p);
    434                     continue;
    435                 }
    436                 p = &(*p)->next;
    437             }
    438         }


Pb is that expunge will has a continue statement after each remove_slot:
=> this will force the code to loop again within the while statement
=> since the while statement condition hasn't changed (p isn't altered), it will loop forever


Workaround: set ttls to 0 meanwhile it is fixed.
 [2007-11-12 09:37 UTC] rasmus@php.net
remove_slot() does change *p though:

static void remove_slot(apc_cache_t* cache, slot_t** slot)
{
    slot_t* dead = *slot;
    *slot = (*slot)->next;
 [2007-12-31 07:50 UTC] christian dot boitel at gmail dot com
Shouldn't it be instead the followings ?

static void remove_slot(apc_cache_t* cache, slot_t** slot)
{
    slot_t* dead = *slot;
    *slot = &(*slot)->next;

instead of 
    *slot = (*slot)->next;

alike it is done in expunge function:
     p = &(*p)->next;
 [2008-01-08 07:48 UTC] gopalv82 at yahoo dot com
slot_t** slot passed into remove_slot is actually populated as

slot = &(*slot)->next;

So, 

*slot = (*slot)->next

would be writing into the previous slot's next ptr, skipping current slot in the process.

Hope that makes sense (I'll re-check the code again to ensure it is used that way).
 [2008-01-29 12:07 UTC] jamon dot dot terrell at intraisp dot com (remove a dot )
We're also seeing this problem, however setting all TTLs to 0  did not solve it for us.  We've tried several configurations, though this is our current config which is still having the problem.  (We have stopped using user cache entirely, and it's still occurring, as well).

[apc]
apc.enable_cli=1
apc.enabled=1
apc.mmap_file_mask=/tmp/apc.XXXXXX
apc.num_files_hint=1024
apc.shm_segments=1
apc.shm_size=64
apc.ttl=0
apc.user_ttl=0
apc.gc_ttl=0
 [2009-03-13 21:29 UTC] shire@php.net
Can you try the latest CVS version, there are couple fixes there that might help this situation.
 [2009-05-05 14:48 UTC] shire@php.net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.


 
PHP Copyright © 2001-2021 The PHP Group
All rights reserved.
Last updated: Mon Dec 06 06:03:37 2021 UTC