- Posted in: Databases, Development
We were using memcache in our application for a long time and it helped a lot to reduce DB servers load on some huge queries. But there was a problem (sometimes called a “dog-pile effect”) – when some cached value was expired and we had a huge traffic, sometimes too many threads in our application were trying to calculate new value to cache it.
For example, if you have some simple but really bad query like
1 | SELECT COUNT(*) FROM some_table WHERE some_flag = X |
which could be really slow on a huge tables, and your cache expires, then ALL your clients calling a page with this counter will end up waiting for this counter to be updated. Sometimes there could be tens or even hundreds of such a queries running on your DB killing your server and breaking an entire application (number of application instances is constant, but more and more instances are locked waiting for a counter).
So, how could we avoid such a problem? First thing came to my mind was: “What if we’d mark old counter as ‘expired’ and then only one thread would re-calculate a counter while all other clients would use old value?”. The idea looks great, but when we cache something in memcached, we it is hard to say when a value vas saved to the cache and when it is going to be expired. After a small research I’ve found a much more elegant solution: we could create two keys in memcached: MAIN key with expiration time a bit higher than normal + a STALE key which expires earlier. So, when we try to read a value from memcached, we try to read STALE key too. If it is expired, it is time to start re-calculation (and set STALE key again with some short TTL).
Final solution we end up using is following (monkey patch for the ActiveRecord::Cache class from the RobotCoop’s memcache-client library):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | # Anti-dog-pile effect caching extension module ActiveRecord class < < Cache STALE_REFRESH = 1 STALE_CREATED = 2 # Caches data received from a block # # The difference between this method and usual Cache.get # is following: this method caches data and allows user # to re-generate data when it is expired w/o running # data generation code more than once so dog-pile effect # won't bring our servers down # def smart_get(key, ttl = nil, generation_time = 30.seconds) # Fallback to default caching approach if no ttl given return get(key) { yield } unless ttl # Create window for data refresh real_ttl = ttl + generation_time * 2 stale_key = "#{key}.stale" # Try to get data from memcache value = get(key) stale = get(stale_key) # If stale key has expired, it is time to re-generate our data unless stale put(stale_key, STALE_REFRESH, generation_time) # lock value = nil # force data re-generation end # If no data retrieved or data re-generation forced, re-generate data and reset stale key unless value value = yield put(key, value, real_ttl) put(stale_key, STALE_CREATED, ttl) # unlock end return value end end end |
Since it is a monkey patch, you need to place this piece of code wherever you want, but it should be used AFTER memcache-client is loaded (for example, you can put it to your config/initializers/ directory or just copy-paste to your environment.rb. And example usage of this patch is following:
1 2 3 4 5 6 7 8 | # This would fall back to a generic get() method because TTL was not provided Cache.smart_get('test') { some_huge_calc } # This would cache your calculation results for a 160 and will re-generate cache in 100 seconds Cache.smart_get('test', 100) { some_huge_calc } # This would cache your calculation results for a 120 and will re-generate cache in 100 seconds Cache.smart_get('test', 100, 10) { some_huge_calc } |
So, this is it – with a simple change we’ve fixed really annoying problem and made our application much more stable.