Posts Tagged ‘scribd’

DB Charmer – ActiveRecord Connection Magic Plugin

Posted by Alexey Kovyrin under Databases, Development, My Projects

Today I’m proud to announce the first public release of our ActiveRecord database connection magic plugin: DbCharmer.


DB Charmer – ActiveRecord Connection Magic Plugin

DbCharmer is a simple yet powerful plugin for ActiveRecord that does a few things:

  1. Allows you to easily manage AR models’ connections (switch_connection_to method)
  2. Allows you to switch AR models’ default connections to a separate servers/databases
  3. Allows you to easily choose where your query should go (on_* methods family)
  4. Allows you to automatically send read queries to your slaves while masters would handle all the updates.
  5. Adds multiple databases migrations to ActiveRecord

Read the rest of this entry »

Advanced Squid Caching in Scribd: Logged In Users and Complex URLs Handling

Posted by Alexey Kovyrin under Admin-tips, Development, My Projects

It’s been a while since I’ve posted my first post about the way we do document pages caching in Scribd and this approach has definitely proven to be really effective since then. In the second post of this series I’d like to explain how we handle our complex document URLs and logged in users in the caching architecture.

First of all, let’s take a look at a typical Scribd’s document URL: http://www.scribd.com/doc/1/Improved-Statistical-Test.

As we can see, it consists of a document-specific part (/doc/1) and a non-unique human-readable slug part (/Improved-Statistical-Test). When a user comes to the site with a wrong slug in the document URL, we need to make sure we send the user to the correct URL with a permanent HTTP 301 redirect. So, obviously we can’t simply send our requests to the squid because it’d cause few problems:

  • When we change document’s title, we’d create a new cached item and would not be able to redirect users from the old URL to the new one
  • When we change a title, we’d pollute cache with additional document page copies.

One more problem that makes the situation even worse – we have 3 different kinds of users on the site:

  1. Logged in users – active web site users that are logged in and should see their name at the top of the page, should see all kinds of customized parts of the page, etc (especially when a page is their own document).
  2. Anonymous users – all users that are not logged in and visit the site with a flash-enabled browser
  3. Bots – all kinds of crawlers that can’t read flash content and need to see a plain text document version

All three kinds of users should see their own document page versions whether the page is cached or not.

Read the rest of this entry »

Loops plugin for rails and merb released

Posted by Alexey Kovyrin under Development, Links, My Projects

loops is a small and lightweight framework for Ruby on Rails and Merb created to support simple background loops in your application which are usually used to do some background data processing on your servers (queue workers, batch tasks processors, etc).

Originally loops plugin was created to make our (Scribd.com) own loops code more organized. We used to have tens of different modules with methods that were called with script/runner and then used with nohup and other not so convenient backgrounding techniques. When you have such a number of loops/workers to run in background it becomes a nightmare to manage them on a regular basis (restarts, code upgrades, status/health checking, etc).

After a short time of writing our loops in more organized ways we were able to generalize most of the loops code so now our loops look like a classes with a single mandatory public method called run. Everything else (spawning many workers, managing them, logging, backgrounding, pid-files management, etc) is handled by the plugin itself.

The major idea behind this small project was to create a deadly simple and yet robust framework to be able to run some tasks in background and do not think about spawning many workers, restarting them when they die, etc. So, if you need to be able to run either one or many copies of your worker or you do not want to think about re-spawning dead workers and do not want to spend megabytes of RAM on separate copies of Ruby interpreter (when you run each copy of your loop as a separate process controlled by monit/god/etc), then I’d recommend you to try this framework — you’ll like it.

For more information, visit the project site and, of course, read the sources :-)

Rails Developer for a Large Startup: My Vision of an Ideal Candidate

Posted by Alexey Kovyrin under Databases, Development, General, Links

Few days ago we were chatting in our corporate Campfire room and one of the guys asked me what do I think about Rails developers hiring process, what questions I’d ask a candidate, etc… This question started really long and interesting discussion and I’d like to share my thoughts on this question in this post.

Read the rest of this entry »

ActiveMQ Tips: Flow Control and Stalled Producers Problem

Posted by Alexey Kovyrin under Admin-tips, Development

It’s been a few months since we‘ve started actively using ActiveMQ queue server in our project. For some time we had pretty weird problems with it and even started thinking about switching to something else or even writing our own queue server which would comply with our requirements. The most annoying problem was the following: some time after activemq restart everything worked really well and then activemq started lagging, queue started growing and all producer processes were stalling on push() operations. We rewrote our producers from Ruby to JRuby, then to Java and still – after some time everything was in a bad shape until we restarted the queue server.

Read the rest of this entry »

ActiveMQ + Ruby Stomp Client: How to process elements one by one

Posted by Alexey Kovyrin under Admin-tips, Databases, Development, My Projects

Few months ago I’ve switched one of our internal projects from doing synchronous database saves of analytics data to an asynchronous processing using starling + a pool of workers. This was the day when I really understood the power of specialized queue servers. I was using database (mostly, MySQL) for this kind of tasks for years and sometimes (especially under a highly concurrent load) it worked not so fast… Few times I worked with some queue servers, but those were either some small tasks or I didn’t have a time to really get the idea, that specialized queue servers were created just to do these tasks quickly and efficiently.

All this time (few months now) I was using starling noticed really bad thing in how it works: if workers die (really die, or lock on something for a long time, or just start lagging) and queue start growing, the thing could kill your server and you won’t be able to do something about it – it just eats all your memory and this is it. Since then I’ve started looking for a better solution for our queuing, the technology was too cool to give up. I’ve tried 5 or 6 different popular solutions and all of them sucked… They ALL had the same problem – if your queue grows, this is your problem and not queue broker’s :-/ The last solution I’ve tested was ActiveMQ and either I wasn’t able to push it to its limits or it is really so cool, but looks like it does not have this memory problem. So, we’ve started using it recently.

In this small post I’d like to describe a few things that took me pretty long to figure out in ruby Stomp client: how to make queues persistent (really!) and how to process elements one by one with clients’ acknowledgments.
Read the rest of this entry »

Advanced Squid Caching for Rails Applications: Preface

Posted by Alexey Kovyrin under Databases, Development, My Projects, Networks

Since the day one when I joined Scribd, I was thinking about the fact that 90+% of our traffic is going to the document view pages, which is a single action in our documents controller. I was wondering how could we improve this action responsiveness and make our users happier.

Few times I was creating a git branches and hacking this action trying to implement some sort of page-level caching to make things faster. But all the time results weren’t as good as I’d like them to be. So, branches were sitting there and waiting for a better idea.
Read the rest of this entry »

Bounces-handler Released

Posted by Alexey Kovyrin under Development, My Projects, Networks

Today I’ve managed to finish initial version of our bounces-handler package we use for mailing-related stuff in Scribd.

Bounces-handler package is a simple set of scripts to automatically process email bounces and ISP‘s feedback loops emails, maintain your mailing blacklists and a Rails plugin to use those blacklists in your RoR applications.

This piece of software has been developed as a part of more global work on mailing quality improvement in Scribd.com, but it was one of the most critical steps after setting up reverse DNS records, DKIM and SPF.

The package itself consists of two parts:

  • Perl scripts to process incoming email:
    • bounces processor — could be assigned to process all your bounce emails
    • feedback loops messages processor — more specific for Scribd, but still – could be modified for your needs (will be released soon).
  • Rails plugin to work with mailing blacklists

For more information, please check our README file. If you have any questions, comments or suggestions, please leave them here as a comments and I’ll try to reply as soon as possible.

Using Sphinx for Non-Fulltext Queries

Posted by Alexey Kovyrin under Admin-tips, Databases, Development, My Projects

How often do you think about the reasons why your favorite RDBMS sucks? :-) Last few months I was doing this quite often and yes, my favorite RDBMS is MySQL. The reason why I was thinking so because one of my recent tasks at Scribd was fixing scalability problems in documents browsing.

The problem with browsing was pretty simple to describe and as hard to fix – we have large data set which consists of a few tables with many fields with really bad selectivity (flag fields like is_deleted, is_private, etc; file_type, language_id , category_id and others). As the result of this situation it becomes really hard (if possible at all) to display documents lists like “most popular 1-10 pages PDF documents in Italian language from the category “Business” (of course, non-deleted, non-private, etc). If you’ll try to create appropriate indexes for each possible filters combination, you’ll end up having tens or hundreds of indexes and every INSERT query in your tables will take ages.

Read the rest of this entry »