Advanced Squid Caching in Scribd: Logged In Users and Complex URLs Handling
21 Jul2009

It’s been a while since I’ve posted my first post about the way we do document pages caching in Scribd and this approach has definitely proven to be really effective since then. In the second post of this series I’d like to explain how we handle our complex document URLs and logged in users in the caching architecture.

First of all, let’s take a look at a typical Scribd’s document URL: http://www.scribd.com/doc/1/Improved-Statistical-Test.

As we can see, it consists of a document-specific part (/doc/1) and a non-unique human-readable slug part (/Improved-Statistical-Test). When a user comes to the site with a wrong slug in the document URL, we need to make sure we send the user to the correct URL with a permanent HTTP 301 redirect. So, obviously we can’t simply send our requests to the squid because it’d cause few problems:

  • When we change document’s title, we’d create a new cached item and would not be able to redirect users from the old URL to the new one
  • When we change a title, we’d pollute cache with additional document page copies.

One more problem that makes the situation even worse – we have 3 different kinds of users on the site:

  1. Logged in users – active web site users that are logged in and should see their name at the top of the page, should see all kinds of customized parts of the page, etc (especially when a page is their own document).
  2. Anonymous users – all users that are not logged in and visit the site with a flash-enabled browser
  3. Bots – all kinds of crawlers that can’t read flash content and need to see a plain text document version

All three kinds of users should see their own document page versions whether the page is cached or not.

Read the rest of this entry


The Blog v.2.0
20 Jul2009

Long time ago, in 2002 I decided to create my own point of presence in the Internet. Back then I’ve got pretty nice domain (scoundrel.kremenchug.net), hacked up a few pages on php, added a guestbook and that was it. Many years it was almost static and I did a few updates on my resume page few times a year. Later I’ve switched the site to wordpress to make it easier to manage my resume and stuff

And 3 years ago in March 2006 I’ve decided to start my own blog. I took a standard template and started the blog on a separate domain while the domain was on its own domain name… This spring my wife made me a great birthday present – she’s created me a custom blog design that has all the stuff I wanted from my own web site for a long time. My friend Dima Shteflyuk has helped me with creating a wordpress template from Tanya’s mockups and here we are – now I’ve decided to merge my blog and my web site into a single web entity called http://kovyrin.net/. Welcome to my new blog/site/whatever!