php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #62801 MemcacheD addServers slowed down after PHP upgrade.
Submitted: 2012-08-12 11:15 UTC Modified: 2012-09-30 17:05 UTC
From: quentin389 at gmail dot com Assigned:
Status: Open Package: memcached (PECL)
PHP Version: 5.3.15 OS: CentOS 5.8
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2012-08-12 11:15 UTC] quentin389 at gmail dot com
Description:
------------
Yesterday we have upgraded PHP on our production CentOS servers from 5.3.13 to 5.3.15. We didn't test if it works before updating all the servers... and now we're stuck with the latest version, that has broken memcached support (or so it seems).
We have 6 PHP servers that connect to two separate memcached servers (for different data) and we monitor every script that creates a memcached connection in time above 30 ms.
Our two memcached servers are on different machines and at non busy times the phpMemcachedAdmin gives me stats like that:
20 active connections
500 requests / s
50 sets / s
150 Kb read /s 
650 Kb write / s
average time to connect ~ 3 ms

After the PHP servers upgrade, at first we have noticed a lot of PAYLOAD FAILURE errors, then WRITE FAILURE errors, and then our servers went crazy so we restarted all PHP and memcached servers.
Those problems have passed and they may be not connected to the main problem, which is that connection time to memcached servers increased from almost never going above 30 ms to doing that alot, and going as high as 500 ms.
The change from no problem to a lot of slow connections happened on each PHP server separately at the time of the upgrade, so it WAS a result of PHP upgrade and server restart on each server.

Test script:
---------------
This is what we do:

1: $this->memcached = new Memcached();
2: $server_list = $this->memcached->getServerList();
3: if (empty($server_list))
4:   if (!$this->memcached->addServers($servers)) throw new Exception('whoops!');
5: if (!$this->memcached->setOption(Memcached::OPT_SERIALIZER, Memcached::SERIALIZER_IGBINARY)) throw new Exception('whoops!');
6: if (!$this->memcached->setOption( Memcached::OPT_BINARY_PROTOCOL, true)) throw new Exception('whoops!');
7: if (!$this->memcached->setOption(Memcached::OPT_TCP_NODELAY, true)) throw new Exception('whoops!');


Expected result:
----------------
All 7 lines almost never went above 30 ms. On any script under any circumstances. Whether the average time was 1 ms or 20 ms - I don't know.

And by 'almost never' I mean 120 calls above 30 ms in two days, with average of 40 ms, none above 200 ms.

Actual result:
--------------
Since the servers upgrade and restarts:
time - 16 hours
10500 calls above 30 ms
1350 calls above 50 ms
156 calls above 100 ms
34 calls above 500 ms

I've tested which part of our script creates those long times and it seems like 95% of 'lag spikes' happen at ->addServers(), and all the other stuff executes at 0 ms.
However, from time to time the spike happens when setting one of the option, and all the other parts are 0 ms.

Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2012-08-12 16:17 UTC] quentin389 at gmail dot com
Something that MAY be unrelated:
I get a strange feeling that something changed with the serialization method after the upgrade.
- We were getting 'payload failures', which suggests that the memcached values from before upgrade could not be deserialized.
- For some weird reason the amount of data in one of our memcached servers grew by 15% or so. Before the upgrade with fully cached keys we were using around 85% of memcached server space. After the upgrade and forced memcached servers restart all of the space is used. Even though the keys are more or less the same.
Please note that both before and after the upgrade we we were using igbinary as a serializer (as shown in the code example), so nothing changed there.
 [2012-08-12 16:26 UTC] quentin389 at gmail dot com
-Operating System: CentOS +Operating System: CentOS 5.8
 [2012-08-12 16:26 UTC] quentin389 at gmail dot com
Our PHP comes from Remi repo.
 [2012-08-12 18:14 UTC] andrei@php.net
Which libmemcached version are you using?
 [2012-08-12 19:35 UTC] quentin389 at gmail dot com
We were using 1.0.2:
Feb 10 16:20:53 Updated: libmemcached-1.0.2-1.el5.remi.x86_64

And the update installed 1.0.4:
Aug 11 19:49:13 Updated: libmemcached-1.0.4-1.el5.remi.x86_64
 [2012-08-13 17:21 UTC] andrei@php.net
And it was faster with 1.0.2?
 [2012-08-13 17:27 UTC] quentin389 at gmail dot com
Can't tell if it was 'with 1.0.2' or with something else from our older PHP package. But yes, it was faster.

Perhaps much faster even. Certainly there weren't that many time spikes.

As I wrote:
- before the upgrade we had 120 memcached connects above 30 ms in two days
- after the upgrade, just in 16 hours we experienced above 10 000 of connects above 30 ms, some of them as high as 500 ms

That's a consistent and visible change from before the upgrade.
 [2012-08-13 23:32 UTC] andrei@php.net
After some investigation, it looks like libmemcached was recently changed to 
attempt connection to the server upon adding it to the list. So, calling 
addServer() or addServers() results in connection attempts to the memcached 
servers. I don't know why this change was made, but it would explain why you're 
seeing a slowdown upon adding servers.

I'll try to check on the reason this change was made..
 [2012-08-18 14:12 UTC] quentin389 at gmail dot com
Indeed, it seems like before the upgrade our memcached connections also had some lag spikes, but they were present when using get() or set(), not addServers().

So it seems that what happened was only that those spikes moved to addServers(),

Which means that our problem lies somewhere on our servers, as there shouldn't be lags like that when using memcached at all. So, we will investigate it further by ourselves.

However it seems really strange to me that connecting was moved to addServers().
When we have 1 server per cluster it's fine. But what if we'd have 20? Would that mean that with each script start we'd have 20 new connections? I assume that before that change only a connection to the server where the particular key would have been stored was created. So while it was possible that one script call would make connection to all 20 servers it was unlikely, and if you'd get/set less than 20 keys - simply impossible.
If that's the case (and I'm not sure, as I don't know the inner workings of PHP memcached), then this change seems like a really bad one.
 [2012-09-30 17:05 UTC] andrei@php.net
The response from libmemcached folks was this: "In benchmarks we found that by 
beginning to setup the connection earlier that the overall time was lower."
 [2013-04-22 14:15 UTC] arjen at react dot com
Documentation @ http://php.net/manual/en/memcached.addservers.php must be updated 
to reflect the new behaviour in libmemcached > 1.0.2.
 
PHP Copyright © 2001-2019 The PHP Group
All rights reserved.
Last updated: Sat Oct 19 14:01:28 2019 UTC