Advanced Squid Caching for Rails Applications: Preface

Posted by Alexey Kovyrin under Databases, Development, My Projects, Networks

Since the day one when I joined Scribd, I was thinking about the fact that 90+% of our traffic is going to the document view pages, which is a single action in our documents controller. I was wondering how could we improve this action responsiveness and make our users happier.

Few times I was creating a git branches and hacking this action trying to implement some sort of page-level caching to make things faster. But all the time results weren’t as good as I’d like them to be. So, branches were sitting there and waiting for a better idea.

Few months ago my good friend has joined Scribd and we’ve started thinking on this problem together. As the result of our brainstorming we’ve managed to figure out what were the problems preventing us from doing efficient caching:

  • First of all, a lots of code in the action is changing the page view if our visitor is a bot (no, not a cloaking, just some minor adjustments of the view).
  • Second problem was a set of differences in the view for anonymous and logged in users.
  • And finally, third problem was the fact that the page has a few blocks that change pretty dynamically: document stats pane and comments lists.

All these problems when combined were creating a lots of pain when I was trying to cache a whole page. When we’ve figured them out, we’ve started thinking on how could we generalize possible combinations of those factors and possible approaches to caching.

There is a well known idea in web applications development: the fastest web app action is an action that does not require any code to be executed on your application server. So, first idea we’ve tried to think about was some approach that would definitely reduce the number of hits on our app servers. This idea was based on HTTP protocol features related to Last-Modified and E-Tag headers. But there was a problem – not so many users go to the same page twice so even if we’d make the page cacheable, it wouldn’t help too much. But the idea of full page caching outside of the application was really good and we’ve started playing with it to figure out how to use it in production.

Long time ago, when Internet was slow and expensive many ISPs and large companies were trying to reduce their traffic w/o hurting users’ experience. Then caching proxy servers were born. The idea of those servers was to handle all web requests going from a network (ISP or a company office) and try to cache as much content as possible so when the same or some other user would request a cached page, proxy server would return it really fast. If we’d implement support for those Last-Modified headers, all proxy servers would be happy to cache our pages. But there was a problem – no one uses caching proxies in 2008 :-) So, we’ve got an idea – why can’t we place such a server in front of our application and make it cache content for all users in the world? (Yes, we knew about a caching reverse proxies before – I’m just trying to explain the flow of our thoughts and words when we were brainstorming the problem).

The only problem with this approach would be to differentiate logged in users, anonymous users and bots. Considering the fact that our proxy server could be placed between the app and our web servers (nginx), we’ve decided to create a nginx module that would translate the same document page URLs to a set of URLs, which would be different for all those 3 kinds of users.

When all those problems with different kinds of users were solved, we’ve decided to solve the last one – non-cacheable dynamic stats pane. The solution was pretty simple – we’ve added a small ajax call to the page which would update stats on the cached version of our page for all real users while bots will see the same page, but with a bit stale stats pane.

Long story short, the results is really great. Application servers load reduced by 50-70%, database servers load is reduced by 30-60%, response times dropped down to 150-200 msec from 500-750 msec. As an additional positive effect of the caching we’ve managed to remove all fragments caches from the application and free more of memcached resources for data caches. Here are a few cacti graphs of our servers load/traffic (the caching was introduced on Oct 9th at night):

Main MySQL command counters:

Cacti
Uploaded with plasq‘s Skitch!


One of our Application Servers CPU Usage:

Cacti
Uploaded with plasq‘s Skitch!


One of our Application Servers Load Average:

Cacti
Uploaded with plasq‘s Skitch!


Unfortunately there are a lot of things to share related to this caching experience, so I’ve decided to make a series of posts that would explain all the problems we had and solutions we’ve found for each of the following parts of the caching system:

So, if you’re interested in details, subscribe to this blog’s RSS feed and in a few days you’ll see the first article from this series.


Related posts:

  1. Advanced Squid Caching in Scribd: Cache Invalidation Techniques
  2. Using Nginx, SSI and Memcache to Make Your Web Applications Faster
  3. Advanced Squid Caching in Scribd: Logged In Users and Complex URLs Handling
  4. Advanced Squid Caching in Scribd: Hardware + Software Used
  5. Dog-pile Effect and How to Avoid it with Ruby on Rails memcache-client Patch

10 Responses to this entry

Dan Kubb says:

Alexey, you mentioned you used Squid for this, but have you tried Varnish? After hearing good things about it I experimented with it a bit with good results. I've not had a chance to test it with a high volume production system though and I'm curious to hear from others who have.

Scoundrel says:

Before choosing Squid 3.0 I've discussed this with percona guys and they shared with my some really weird stories from their experience with Varnish (constant memory leaks, crashes, etc). So, I decided to go with Squid and pretty happy about the result so far.

The only thing that bothers me is that on that box I have two cores and squid uses only one of them… but so far even on one core it was just awesome (squid 3.0! 2.X versions were just awful first two days when I was setting the stuff up).

Tobias Luetke says:

Since we are talking about server side caching, i have a question for you:

My rails app supports 304 Not Modified for a lot of key url, some of those return binary images. I want to use a server side caching proxy to keep the content – but it's important that requests still make it to the rails app for statistic collection reasons.

The rails app will reply 304 which should let the caching proxy release it's content to the client. That's at least how I think this should work. Varnish is currently disagreeing with me no matter how I setup the Cache-Control

Scoundrel says:

Yes, I forgot to mention that we do hit the backend app for the same statistic reasons. We set Cache-Control header to 'public, max-age=0, must-revalidate' and this makes squid check with the backend on each request. I'll give more details in one of the following posts where I'll explain what and how we do in the application side. I think this will be the next post actually since this is what could be the most useful for others.

Ryan Tomayko says:

From what I've been able to determine, Varnish does not support validation (If-Modified-Since, If-Not-Modified). It supports the freshness model (Expires, Cache-Control/max-age) only. I was pretty sure Varnish 2.0 was going to add support for validation but it doesn't look like it's made it in yet.

Joerg says:

Have you ever thought about splitting the page into a static,cachable part and a dynamic part, which are then merged on the frontend-Webservers using Server Side Includes?

Scoundrel says:

Yes, this was the first thing I've tried (one of those many dead branches here). The problem was the following: we need lots of data to generate all the stuff on the page. Document info, comments, user info, owner info, stats, document security settings, bots information and much more. If we'd just take the main template and replace it with an SSI/ESI code, it wouldn't reduce load on our db machines at all and it would be painful to make all those SSI/ESI actions dependent on current user_id, bot/non-bot status, etc. And finally, we'd be forced to load those pieces of data for each small piece of html we generate for the ESI/SSI template. So, after implementing a huge part of this idea I've just stuck with a messy and weird configuration/code.

WK says:

“We’ve decided to create a nginx module that would translate the same document page URLs to a set of URLs, which would be different for all those 3 kinds of users.”

Just to share, instead of URL rewriting, the way we approached this problem was to use Squid's X-Accelerator-Vary feature. We make the load balancer inject headers (e.g. X-User-Type) and make the page return the appropriate X-Accelerator-Vary headers. Squid then selects an appropriate version of the page, matching the headers and cached copy's X-A-V fields.