Dog-pile Effect and How to Avoid it with Ruby on Rails memcache-client Patch

Posted in: Databases, Development
Tags: caching, memcache, Ruby, Ruby On Rails, scalability

10 Mar2008

We were using memcache in our application for a long time and it helped a lot to reduce DB servers load on some huge queries. But there was a problem (sometimes called a “dog-pile effect”) – when some cached value was expired and we had a huge traffic, sometimes too many threads in our application were trying to calculate new value to cache it.

For example, if you have some simple but really bad query like

1	SELECT COUNT(*) FROM some_table WHERE some_flag = X

which could be really slow on a huge tables, and your cache expires, then ALL your clients calling a page with this counter will end up waiting for this counter to be updated. Sometimes there could be tens or even hundreds of such a queries running on your DB killing your server and breaking an entire application (number of application instances is constant, but more and more instances are locked waiting for a counter).

So, how could we avoid such a problem? First thing came to my mind was: “What if we’d mark old counter as ‘expired’ and then only one thread would re-calculate a counter while all other clients would use old value?”. The idea looks great, but when we cache something in memcached, we it is hard to say when a value vas saved to the cache and when it is going to be expired. After a small research I’ve found a much more elegant solution: we could create two keys in memcached: MAIN key with expiration time a bit higher than normal + a STALE key which expires earlier. So, when we try to read a value from memcached, we try to read STALE key too. If it is expired, it is time to start re-calculation (and set STALE key again with some short TTL).

Final solution we end up using is following (monkey patch for the ActiveRecord::Cache class from the RobotCoop’s memcache-client library):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

# Anti-dog-pile effect caching extension
module ActiveRecord
class < < Cache
STALE_REFRESH = 1
STALE_CREATED = 2

# Caches data received from a block
#
# The difference between this method and usual Cache.get
# is following: this method caches data and allows user
# to re-generate data when it is expired w/o running
# data generation code more than once so dog-pile effect
# won't bring our servers down
#
def smart_get(key, ttl = nil, generation_time = 30.seconds)
# Fallback to default caching approach if no ttl given
return get(key) { yield } unless ttl

# Create window for data refresh
real_ttl = ttl + generation_time * 2
stale_key = "#{key}.stale"

# Try to get data from memcache
value = get(key)
stale = get(stale_key)

# If stale key has expired, it is time to re-generate our data
unless stale
put(stale_key, STALE_REFRESH, generation_time) # lock
value = nil # force data re-generation
end

# If no data retrieved or data re-generation forced, re-generate data and reset stale key
unless value
value = yield
put(key, value, real_ttl)
put(stale_key, STALE_CREATED, ttl) # unlock
end

return value
end
end
end

Since it is a monkey patch, you need to place this piece of code wherever you want, but it should be used AFTER memcache-client is loaded (for example, you can put it to your config/initializers/ directory or just copy-paste to your environment.rb. And example usage of this patch is following:

1
2
3
4
5
6
7
8

# This would fall back to a generic get() method because TTL was not provided
Cache.smart_get('test') { some_huge_calc }

# This would cache your calculation results for a 160 and will re-generate cache in 100 seconds
Cache.smart_get('test', 100) { some_huge_calc }

# This would cache your calculation results for a 120 and will re-generate cache in 100 seconds
Cache.smart_get('test', 100, 10) { some_huge_calc }

So, this is it – with a simple change we’ve fixed really annoying problem and made our application much more stable.

Homo-Adminus Blog

Yet Another Admin’s Blog