php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #64144 segmentation fault multi-get ascii-protocol + high network latency or no answer
Submitted: 2013-02-04 17:50 UTC Modified: 2021-03-25 16:46 UTC
Votes:16
Avg. Score:4.9 ± 0.5
Reproduced:14 of 14 (100.0%)
Same Version:11 (78.6%)
Same OS:10 (71.4%)
From: rene dot kerner at trivago dot com Assigned: cmb (profile)
Status: Closed Package: memcache (PECL)
PHP Version: Irrelevant OS: Ubuntu/freeBSD, irrelevant
Private report: No CVE-ID: None
View Add Comment Developer Edit
Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know!
Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem:
41 - 8 = ?
Subscribe to this entry?

 
 [2013-02-04 17:50 UTC] rene dot kerner at trivago dot com
Description:
------------
tested with: memcache-3.0.6, memcache-3.0.7

We have a complex application with many webservers and some not too small memcache-pools.

On all systems (dev and prod), from time to time, we got segmentation faults (about 40-90 a day, but sometimes 300-400 per day).

On further investigation we find out that the problem was a late (after the default timeout of 1 second) or not received answer of the memcache server or another kind of network/request latency on multi-get (ascii-protocol).

It's happening on memcache-3.0.6 and 3.0.7 php extension using the ascii protocol.
The segfaults are gone when the memcache answer is fast and arrives, or on using binary protocol or on single-get in ascii-protocol.

We were able to reproduce this bug with adding a latency to the incoming traffic.

to simulate network latency see:
http://www.linuxfoundation.org/collaborate/workgroups/networking/netem


Test script:
---------------
<?php
// ...

$_aKeys = array ('Item-123::en::5123', 'Item-123::en::5678');

// set incoming traffic latency to 1200ms, memcache standard timeout is 1s = 1000 ms; "tc" is allowed in sudoers file to make this example work [or apache run as root]

exec("sudo /sbin/tc qdisc add dev ifb0 root netem delay 1200ms");

$_mResult = $memcache->get($_aKeys);

// remove network latency
exec("sudo /sbin/tc qdisc del dev ifb0 root netem");

?>

Expected result:
----------------
No segmentation fault and returns an empty array() or false or something, correct failure handling.

Actual result:
--------------
Segmentation fault.

CoreDump-Backtrace:

Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f08de57d0a2 in mmc_value_handler_multi (key=0x7f08efef1890 "S::1456589084::Item-DE::de::13::91565", key_len=37, value=0x7fff532734f0, flags=0, cas=0, param=<optimized out>) at /var/www/trivago/orchestra_trunk/memcache-3.0.6/memcache.c:1517
1517            if (Z_TYPE_P(result[0]) != IS_ARRAY) {
(gdb) bt
#0  0x00007f08de57d0a2 in mmc_value_handler_multi (key=0x7f08efef1890 "S::1456589084::Item-DE::de::13::91565", key_len=37, value=0x7fff532734f0, flags=0, cas=0, param=<optimized out>) at /var/www/trivago/orchestra_trunk/memcache-3.0.6/memcache.c:1517
#1  0x00007f08de5818f1 in mmc_unpack_value (mmc=<optimized out>, request=0x7f08efef16d8, buffer=0x7f08efef1700, key=0x7f08efef1890 "S::1456589084::Item-DE::de::13::91565", key_len=37, flags=0, cas=0, bytes=548)
    at /var/www/trivago/orchestra_trunk/memcache-3.0.6/memcache_pool.c:525
#2  0x00007f08de587074 in mmc_server_read_value (request=0x7f08efef16d8, mmc=0x7f08ee4b4c30) at /var/www/trivago/orchestra_trunk/memcache-3.0.6/memcache_ascii_protocol.c:189
#3  mmc_server_read_value (mmc=0x7f08ee4b4c30, request=0x7f08efef16d8) at /var/www/trivago/orchestra_trunk/memcache-3.0.6/memcache_ascii_protocol.c:176
#4  0x00007f08de583ac3 in mmc_pool_select (pool=0x7f08ef059790) at /var/www/trivago/orchestra_trunk/memcache-3.0.6/memcache_pool.c:1584
#5  0x00007f08de584538 in mmc_pool_run (pool=0x7f08ef059790) at /var/www/trivago/orchestra_trunk/memcache-3.0.6/memcache_pool.c:1670
#6  0x00007f08de57eb27 in php_mmc_store (ht=<optimized out>, return_value=0x7f08f01f4bd0, this_ptr=<optimized out>, op=1, return_value_ptr=<optimized out>, return_value_used=<optimized out>) at /var/www/trivago/orchestra_trunk/memcache-3.0.6/memcache.c:521
#7  0x00007f08e982889d in zend_do_fcall_common_helper_SPEC (execute_data=0x7f08e7958700) at /build/buildd/php5-5.3.10/Zend/zend_vm_execute.h:320
#8  0x00007f08e97d940b in execute (op_array=0x7f08eef4a490) at /build/buildd/php5-5.3.10/Zend/zend_vm_execute.h:107
#9  0x00007f08e97a694c in zend_call_function (fci=0x7fff53273aa0, fci_cache=0x7f08e79548f8) at /build/buildd/php5-5.3.10/Zend/zend_execute_API.c:969
#10 0x00007f08e96e92b8 in zif_call_user_func_array (ht=0, return_value=0x7f08f01e8c90, return_value_ptr=0x7f08f01f6918, this_ptr=0x7f08f01f6940, return_value_used=-2096513353) at /build/buildd/php5-5.3.10/ext/standard/basic_functions.c:4803
#11 0x00007f08e982889d in zend_do_fcall_common_helper_SPEC (execute_data=0x7f08e79548f8) at /build/buildd/php5-5.3.10/Zend/zend_vm_execute.h:320
#12 0x00007f08e97d940b in execute (op_array=0x7f08ee72d790) at /build/buildd/php5-5.3.10/Zend/zend_vm_execute.h:107
#13 0x00007f08e97a694c in zend_call_function (fci=0x7fff53273da0, fci_cache=0x7f08e7951950) at /build/buildd/php5-5.3.10/Zend/zend_execute_API.c:969
#14 0x00007f08e96e9041 in zif_call_user_func (ht=0, return_value=0x7f08efaf35e0, return_value_ptr=0x7f08f01f6918, this_ptr=0x7f08f01f6940, return_value_used=-2096513353) at /build/buildd/php5-5.3.10/ext/standard/basic_functions.c:4778
#15 0x00007f08e982889d in zend_do_fcall_common_helper_SPEC (execute_data=0x7f08e7951950) at /build/buildd/php5-5.3.10/Zend/zend_vm_execute.h:320
#16 0x00007f08e97d940b in execute (op_array=0x7f08efe58840) at /build/buildd/php5-5.3.10/Zend/zend_vm_execute.h:107
#17 0x00007f08e97a694c in zend_call_function (fci=0x7fff53274090, fci_cache=0x7f08e794e078) at /build/buildd/php5-5.3.10/Zend/zend_execute_API.c:969
#18 0x00007f08e96e92b8 in zif_call_user_func_array (ht=0, return_value=0x7f08ee61f7b0, return_value_ptr=0x7f08f01f6918, this_ptr=0x7f08f01f6940, return_value_used=-2096513353) at /build/buildd/php5-5.3.10/ext/standard/basic_functions.c:4803
#19 0x00007f08e982889d in zend_do_fcall_common_helper_SPEC (execute_data=0x7f08e794e078) at /build/buildd/php5-5.3.10/Zend/zend_vm_execute.h:320
#20 0x00007f08e97d940b in execute (op_array=0x7f08ee72d790) at /build/buildd/php5-5.3.10/Zend/zend_vm_execute.h:107
#21 0x00007f08e97b4970 in zend_execute_scripts (type=32520, retval=0x800000000, file_count=3) at /build/buildd/php5-5.3.10/Zend/zend.c:1308
#22 0x00007f08e9760fe3 in php_execute_script (primary_file=0x7f08ec8230ac) at /build/buildd/php5-5.3.10/main/main.c:2323
#23 0x00007f08e9843e7d in php_handler (r=0x7f08e9843e7d) at /build/buildd/php5-5.3.10/sapi/apache2handler/sapi_apache2.c:688
#24 0x00007f08ed6aa508 in ap_run_handler ()
#25 0x00007f08ed6aa97e in ap_invoke_handler ()
#26 0x00007f08ed6ba570 in ap_process_request ()
#27 0x00007f08ed6b7398 in ?? ()
#28 0x00007f08ed6b0fa8 in ap_run_process_connection ()
#29 0x00007f08ed6bf1d0 in ?? ()
#30 0x00007f08ed6bf93a in ?? ()
#31 0x00007f08ed6bf9f7 in ?? ()
#32 0x00007f08ed6c0374 in ap_mpm_run ()
#33 0x00007f08ed6954a4 in main ()


process strace:

socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 26
fcntl(26, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(26, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
connect(26, {sa_family=AF_INET, sin_port=htons(11212), sin_addr=inet_addr("192.168.1.53")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=26, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 1000) = 1 ([{fd=26, revents=POLLOUT}])
getsockopt(26, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl(26, F_SETFL, O_RDWR)              = 0
fcntl(26, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(26, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
select(27, [26], [26], NULL, {1, 0})    = 1 (out [26], left {0, 999998})
sendto(26, "set S::1456589084::Item-DE::de::13::1831719 0 21600 492\r\nx\332\205SMo\2430\20\375+\21\367\2550\204\0\316i7J\272\221\262I\324V]i\325\313\2448t\24\260\221\355dSU\375\357;\6Ci/{A\366{\303{\363\345\35\217\246<x\320x\201R=-\245E\373\372\264\266\242n?\1\217S\376fx\314\3\\\27\301\0349\313b\226\262|nxF\330\255P\253\nJG$\4\245\4\335[\260g\343\220o,$\2141\367/i=\2746\302k\20\234\363\340\270\1r;\27\"\350\356\323\360f\26\317\2420tw\26\272\0%\313q\4Ko\3628\313\363\274\215\240\274qw\21\32\252j\203'\224m\32\241\317\354\7\30Q\354d\17%\4\335\201<\365\367\334\337\27Tn\217\261"..., 551, MSG_NOSIGNAL, NULL, 0) = 551
sendto(26, "get S::1456589084::Item-DE::de::13::1831719\r\n", 45, MSG_NOSIGNAL, NULL, 0) = 45
select(27, [26], [], NULL, {1, 0})      = 1 (in [26], left {0, 999931})
recvfrom(26, "STORED\r\n", 32768, 0, NULL, NULL) = 8
recvfrom(26, "VALUE S::1456589084::Item-DE::de::13::1831719 0 492\r\nx\332\205SMo\2430\20\375+\21\367\2550\204\0\316i7J\272\221\262I\324V]i\325\313\2448t\24\260\221\355dSU\375\357;\6Ci/{A\366{\303{\363\345\35\217\246<x\320x\201R=-\245E\373\372\264\266\242n?\1\217S\376fx\314\3\\\27\301\0349\313b\226\262|nxF\330\255P\253\nJG$\4\245\4\335[\260g\343\220o,$\2141\367/i=\2746\302k\20\234\363\340\270\1r;\27\"\350\356\323\360f\26\317\2420tw\26\272\0%\313q\4Ko\3628\313\363\274\215\240\274qw\21\32\252j\203'\224m\32\241\317\354\7\30Q\354d\17%\4\335\201<\365\367\334\337\27Tn\217\261\250\7\325Y"..., 32768, 0, NULL, NULL) = 552
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
chdir("/etc/apache2")                   = 0
rt_sigaction(SIGSEGV, {SIG_DFL, [], SA_RESTORER|SA_INTERRUPT, 0x7fc5e652ecb0}, {SIG_DFL, [], SA_RESTORER|SA_RESETHAND, 0x7fc5e652ecb0}, 8) = 0
kill(7835, SIGSEGV)                     = 0
rt_sigreturn(0x1e9b)                    = 25769803777
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
Process 7835 detached


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2014-11-05 20:09 UTC] barry dot jaspan at acquia dot com
This bug is still present in memcache-3.0.8. I was able to track it down more precisely in the extension source code. The hint from rene dot kerner at trivago dot com that the problem was related to network delay was very helpful, thank you. :-)

Here is the sequence of events leading to the bug:

1. PHP calls zif_memcache_get() with an array of keys. zif_memcache_get() allocates "void *value_handler_param[3]" on the stack, and then calls mmc_pool_schedule_get() with mmc_value_handler_multi and value_handler_param as arguments, along with the failover handler mmc_value_failover_handler. zif_memcache_get() then ends by calling mmc_pool_run().
2. mmc_pool_run() sees no items on the pending queue, so it just calls mmc_pool_select().
3. mmc_pool_select() writes the request, then times out waiting for the response.
4. The timeout handling code calls mmc_select_failure(), which calls mmc_server_deactivate(), which calls the request's failover handler, mmc_value_failover_handler().
5. mmc_value_failover_handler() "re-schedules the key if it does not exist in the return value array," with this fateful code that copies request->value_handle_param, which is a pointer to zif_memcache_get()'s stack frame.

			mmc_pool_schedule_get(pool, MMC_PROTO_UDP,
				value_handler_param[2] != NULL ? MMC_OP_GETS : MMC_OP_GET, *key,
				request->value_handler, request->value_handler_param,
				request->failover_handler, request->failover_handler_param, request TSRMLS_CC);

6. mmc_pool_schedule_get() pushes the current server onto the pool's pending queue, with its mmc->buildreq structure set to a new request including the value_handler_param from zif_memcache_get()'s stack frame.
7. mmc_pool_select() returns back to mmc_pool_run(). mmc_pool_run() only loops until reading->len and sending->len are zero. They are, even though the pending queue is not empty, so mmc_pool_run() returns back to PHP.

... deep breath ...

8. PHP calls any other memcache function, such as zif_memcache_set().
9. mmc_pool_run() sees an item on the pending queue; this is the request structure with the pointer to the stack frame of the previous invocation of zif_memcache_get(). It passes this request to mmc_pool_schedule(), putting it on the reading and sending queue, then calls mmc_pool_select().
10. Data arrives from the previously delayed server. mmc_value_handler_multi() gets called, and is passed the value_handler_param with the invalid stack pointer.
11. On memcache.c:1535, mmc_value_handler_multi tries to read from the invalid stack pointer from value_handler_param and gets a segfault.

Here is a test program that reproduces the crash.

<?php

// Servers running memcached. More than one is required to trigger a segfault.
$hosts = array('web-5.bjaspan.ahserversdev.com', 'web-6.bjaspan.ahserversdev.com');
// Network devices used to reach all servers in $hosts.
$devices = array('lo', 'eth0');

// Add all the servers. This does not actually open connections.
$mc = new Memcache;
foreach ($hosts as $host) {
  $mc->addServer($host, 11211);
}

// Open connections and issue a get request.
$mc->get(array('mykey', 'mykey2'), MEMCACHE_COMPRESSED);

// Slow down the network and issue a get request that will time out.
foreach ($devices as $dev) {
  exec("sudo /sbin/tc qdisc add dev $dev root netem delay 5000ms");
}
$mc->get(array('mykey', 'mykey2'), MEMCACHE_COMPRESSED);
foreach ($devices as $dev) {
  exec("sudo /sbin/tc qdisc del dev $dev root netem");
}

// Issue an unrelated request, which will segfault because the previous timed
// out request is still on the pool's pending queue, with an invalid pointer
// to the stack.
$mc->set('mykey', 'myval', MEMCACHE_COMPRESSED, 0);
?>

Here is the crash and stack trace under PHP 5.3. The crash happens with the latest builds of PHP 5.3, 5.5, and 5.6, though the exact location of the crash is different (presumably because the memory layout is different). 

# gdb /usr/local/php5.3/bin/php
GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/php5.3/bin/php...done.
(gdb) run mc2.php
Starting program: /usr/local/php5.3/bin/php mc2.php
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffecd87700 (LWP 21089)]
[Thread 0x7fffecd87700 (LWP 21089) exited]
PHP Notice:  MemcachePool::get(): Server web-6.bjaspan.ahserversdev.com (tcp 11211, udp 0) failed with: Network timeout (0) in /root/mc2.php on line 21

Program received signal SIGSEGV, Segmentation fault.
0x00007fffed63758d in mmc_value_handler_multi (key=0x166c950 "mykey",
    key_len=5, value=0x1669fb0, flags=0, cas=0, param=0x7fffffffac80)
    at /tmp/pear/temp/memcache/memcache.c:1535
1535		if (Z_TYPE_P(result[0]) != IS_ARRAY) {
(gdb) where
#0  0x00007fffed63758d in mmc_value_handler_multi (key=0x166c950 "mykey",
    key_len=5, value=0x1669fb0, flags=0, cas=0, param=0x7fffffffac80)
    at /tmp/pear/temp/memcache/memcache.c:1535
#1  0x00007fffed63ac93 in mmc_unpack_value (mmc=0x1689470, request=0x166c798,
    buffer=0x166c7c0, key=0x166c950 "mykey", key_len=5, flags=0, cas=0,
    bytes=5) at /tmp/pear/temp/memcache/memcache_pool.c:529
#2  0x00007fffed63edc1 in mmc_server_read_value (mmc=0x1689470,
    request=0x166c798) at /tmp/pear/temp/memcache/memcache_ascii_protocol.c:191
#3  0x00007fffed63dd7a in mmc_pool_select (pool=0x1667540)
    at /tmp/pear/temp/memcache/memcache_pool.c:1588
#4  0x00007fffed63e071 in mmc_pool_run (pool=0x1667540)
    at /tmp/pear/temp/memcache/memcache_pool.c:1674
#5  0x00007fffed634501 in php_mmc_store (ht=4, return_value=0x166c2f0,
    return_value_ptr=0x0, this_ptr=0x1666e80, return_value_used=0, op=1)
    at /tmp/pear/temp/memcache/memcache.c:525
#6  0x00007fffed637438 in zif_memcache_set (ht=4, return_value=0x166c2f0,
    return_value_ptr=0x0, this_ptr=0x1666e80, return_value_used=0)
    at /tmp/pear/temp/memcache/memcache.c:1491
#7  0x0000000000a17f6c in zend_do_fcall_common_helper_SPEC (
    execute_data=0x7ffff7e7f090)
    at /usr/local/src/php-5.3.29/Zend/zend_vm_execute.h:322
#8  0x0000000000a18640 in ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER (
    execute_data=0x7ffff7e7f090)
    at /usr/local/src/php-5.3.29/Zend/zend_vm_execute.h:430
#9  0x0000000000a17410 in execute (op_array=0x1663420)
    at /usr/local/src/php-5.3.29/Zend/zend_vm_execute.h:107
#10 0x00000000009e3387 in zend_execute_scripts (type=8, retval=0x0,
    file_count=3) at /usr/local/src/php-5.3.29/Zend/zend.c:1259
#11 0x000000000096b091 in php_execute_script (primary_file=0x7fffffffd3a0)
    at /usr/local/src/php-5.3.29/main/main.c:2316
#12 0x0000000000ac80f9 in main (argc=2, argv=0x7fffffffe618)
    at /usr/local/src/php-5.3.29/sapi/cli/php_cli.c:1189
(gdb)
 [2014-11-05 21:07 UTC] barry dot jaspan at acquia dot com
It seems to me that the problem must be in step 7; "mmc_pool_select() returns back to mmc_pool_run(). mmc_pool_run() only loops until reading->len and sending->len are zero. They are, even though the pending queue is not empty, so mmc_pool_run() returns back to PHP."

The PHP Memcache API (http://php.net/manual/en/book.memcache.php) does not have any provision for a get request returning "later" (e.g. via a callback). The get() method blocks until it returns all available values. There cannot be any point in a failure during one get() to leave an item on the pool's pending queue.

Thus, one possible fix is for mmc_pool_run() to keep looping, calling mmc_pool_select(), until the pending pool is empty, too.
 [2014-11-06 00:09 UTC] hradtke@php.net
Thanks for the analysis. I will take a look and see if I cannot resolve this.
 [2014-11-06 15:02 UTC] barry dot jaspan at acquia dot com
Great. Let me know if I can help.
 [2014-11-06 18:13 UTC] rrh at newrelic dot com
We can help too.
 [2014-11-06 19:17 UTC] rrh at newrelic dot com
I ran clang's scan-build static analyzer on the file ext/memcache/memcache.c.  It didn't report anything, such as automatic variables escaped by taking pointers to them.  I suspect this analysis, especially in the pointer soup that is zend, is beyond what scan build can do.
 [2014-11-07 16:45 UTC] barry dot jaspan at acquia dot com
I created a proposed PR for this issue at https://github.com/php/pecl-caching-memcache/pull/3. The "Add a pull request" link on this page does not work because the base repo is not one of the available options.
 [2021-03-25 16:46 UTC] cmb@php.net
-Status: Open +Status: Closed -Assigned To: +Assigned To: cmb
 [2021-03-25 16:46 UTC] cmb@php.net
The official bug tracker for PECL/memcache is now at
<https://github.com/websupport-sk/pecl-memcache/issues>.

So, if this is still an issue with either of the current memcache
versions (4 or 8), please file an issue there.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Mar 28 18:01:29 2024 UTC