So, after a few weeks of looking for a new job I’m really excited to start my journey in a young, but very ambitious startup called Swiftype which is focused on developing a technology for private site search, that could be used on everything from small blogs to large product sites. The company is growing really fast and I’m going to lead all the work on infrastructure, build the ops team and hope to get a chance to do some coding along the way.
Stay tuned – I really hope to finally get a chance to do more blogging this year.
As of today I’m no longer working for LivingSocial and I’m looking for the next thing to work on. Since my family is in Toronto and I have an apartment (mortgage) here, I’m not looking to relocate and currently looking for something remote (I have 7+ years of remote work experience) or something local in Toronto.
For more information on my background, please check my Github profile, my linkedin profile or the resume section on this blog. If you need to contact me, feel free to use any channels listed on the contacts page.
Update: After a few initial interviews I’d like to update this post with a bit more details on what I’m looking for in the new position.
First of all, I’m really not sure I want to be yet another ops engineer working on “everything ops” in my next company. If I’d be to join a company as a regular ops engineer, I’d prefer it to be a clearly defined role with a clear focus on some set of challenging problems. I’m honestly tired of setting up cacti/nagios/chef at this point and would like the job to be a little bit more challenging.
Though even more I’m interested in being able to make strategic technical decisions for an operations team and apply my experience and knowledge for solving challenging tasks with a dedicated team of ops engineers. This could be anything from a tech ops team lead role (in a medium/large companies) to a director of technical operations (in a small-to-medium sized startups).
Update: Ok, I’ve found a new job – I work for Swiftype now!
This week, after 3 months in the works, we’ve finally released version 1.7.0 of DbCharmer ruby gem – Rails plugin that significantly extends ActiveRecord’s ability to work with multiple databases and/or database servers by adding features like multiple databases support, master/slave topologies support, sharding, etc.
New features in this release:
- Rails 3.0 support. We’ve worked really hard to bring all the features we supported in Rails 2.X to the new version of Rails and now I’m proud that we’ve implemented them all and the implementation looks much cleaner and more universal (all kinds of relations in rails 3 work in exactly the same way and we do not need to implement connection switching for all kinds of weird corner-cases in ActiveRecord).
- Forced Slave Reads functionality. Now we could have models with slaves that are not used by default, but could be turned on globally (per-controller, per-action or in a block). This is a new feature that brings our master/slave routing capabilities to a really new level – we could now use it for a really mission-critical models on demand and not be afraid of breaking major functionality of our applications by switching them to slave reads.
- Lots of changes were made in the structure of our code and tests to make sure it would be much easier for new developers to understand DbCharmer internals and make changes in its code.
Along with the new release we’ve got a brand new web site. You can find much better, cleaner and, most importantly, correct documentation for the library on the web site. We’ll be adding more examples, will try to add more in-depth explanation of our core functions, etc.
If you have any questions about the release, feel free to ask them in our new mailing list: DbCharmer Users Group.
For more updates on our releases, you can follow @DbCharmer on Twitter.
Having a reverse-proxy web cache as one of the major infrastructure elements brings many benefits for large web applications: it reduces your application servers load, reduces average response times on your site, etc. But there is one problem every developer experiences when works with such a cache – cached content invalidation.
It is a complex problem that usually consists of two smaller ones: individual cache elements invalidation (you need to keep an eye on your data changes and invalidate cached pages when related data changes) and full cache purges (sometimes your site layout or page templates change and you need to purge all the cached pages to make sure users will get new visual elements of layout changes). In this post I’d like to look at a few techniques we use at Scribd to solve cache invalidation problems.
Read the rest of this entry »
Back in November 2009 I was working on a project to port Scribd.com code base to Rails 2.2 and noticed that some old plugins we were using in 2.1 were abandoned by their authors. Some of them were just removed from the code base, but one needed a replacement – that was an old plugin called acts_as_readonlyable that helped us to distribute our queries among a cluster of MySQL slaves. There were some alternatives but we didn’t like them for one or another reasons so we’ve decided to go with creating our own ActiveRecord plugin, that would help us scale our databases out. That’s the story behind the first release of DbCharmer.
Today, six months after the first release of the gem and we’ve moved it to gemcutter (which is now the official gems hosting) and we’re already at version 1.6.11. The gem was downloaded more than 2000 times. There are (at least) 10+ large users that rely on this gem to scale their products out. And (this is the most exciting) we’ve added tons of new features to the product.
Here are the main features added since the first release:
- Much better multi-database migrations support including default migrations connection changing.
- We’ve added ActiveRecord associations preload support that makes it possible to move eager loading queries to the same connection where your finder queries go to.
- We’ve improved ActiveRecord’s query logging feature and now you can see what connections your queries executed on (and yes, all those improvements are colorized ).
- We’ve added an ability to temporary remap any ActiveRecord connections to any other connections for a block of code (really useful when you need to make sure all your queries would go to some non-default slave and you do not want to mess with all your models).
- The most interesting change: we’ve implemented some basic sharding functionality in ActiveRecord which currently is being used in production in our application.
As you can see now DbCharmer helps you to do three major scalability tasks in your Rails projects:
- Master-Slave clusters to scale out your Rails models reads.
- Vertical sharding by moving some of your models to a separate (maybe even dedicated) servers and still keep using AR associations
- Horizontal sharding by slicing your models data to pieces and placing those pieces into different databases and/or servers.
So, If you didn’t check DbCharmer out yet and you’re working on some large rails project that is (or going to be) facing scalability problems, go read the docs, download/install the gem and prove them that Rails CAN scale!
Today I’m proud to announce the first public release of our ActiveRecord database connection magic plugin: DbCharmer.
DB Charmer – ActiveRecord Connection Magic Plugin
DbCharmer is a simple yet powerful plugin for ActiveRecord that does a few things:
- Allows you to easily manage AR models’ connections (
- Allows you to switch AR models’ default connections to a separate servers/databases
- Allows you to easily choose where your query should go (
on_* methods family)
- Allows you to automatically send read queries to your slaves while masters would handle all the updates.
- Adds multiple databases migrations to ActiveRecord
Read the rest of this entry »
It’s been a while since I’ve posted my first post about the way we do document pages caching in Scribd and this approach has definitely proven to be really effective since then. In the second post of this series I’d like to explain how we handle our complex document URLs and logged in users in the caching architecture.
First of all, let’s take a look at a typical Scribd’s document URL: http://www.scribd.com/doc/1/Improved-Statistical-Test.
As we can see, it consists of a document-specific part (/doc/1) and a non-unique human-readable slug part (/Improved-Statistical-Test). When a user comes to the site with a wrong slug in the document URL, we need to make sure we send the user to the correct URL with a permanent HTTP 301 redirect. So, obviously we can’t simply send our requests to the squid because it’d cause few problems:
- When we change document’s title, we’d create a new cached item and would not be able to redirect users from the old URL to the new one
- When we change a title, we’d pollute cache with additional document page copies.
One more problem that makes the situation even worse – we have 3 different kinds of users on the site:
- Logged in users – active web site users that are logged in and should see their name at the top of the page, should see all kinds of customized parts of the page, etc (especially when a page is their own document).
- Anonymous users – all users that are not logged in and visit the site with a flash-enabled browser
- Bots – all kinds of crawlers that can’t read flash content and need to see a plain text document version
All three kinds of users should see their own document page versions whether the page is cached or not.
Read the rest of this entry »
Long time ago, in 2002 I decided to create my own point of presence in the Internet. Back then I’ve got pretty nice domain (scoundrel.kremenchug.net), hacked up a few pages on php, added a guestbook and that was it. Many years it was almost static and I did a few updates on my resume page few times a year. Later I’ve switched the site to wordpress to make it easier to manage my resume and stuff…
And 3 years ago in March 2006 I’ve decided to start my own blog. I took a standard template and started the blog on a separate domain while the domain was on its own domain name… This spring my wife made me a great birthday present – she’s created me a custom blog design that has all the stuff I wanted from my own web site for a long time. My friend Dima Shteflyuk has helped me with creating a wordpress template from Tanya’s mockups and here we are – now I’ve decided to merge my blog and my web site into a single web entity called http://kovyrin.net/. Welcome to my new blog/site/whatever!
loops is a small and lightweight framework for Ruby on Rails and Merb created to support simple background loops in your application which are usually used to do some background data processing on your servers (queue workers, batch tasks processors, etc).
Originally loops plugin was created to make our (Scribd.com) own loops code more organized. We used to have tens of different modules with methods that were called with script/runner and then used with nohup and other not so convenient backgrounding techniques. When you have such a number of loops/workers to run in background it becomes a nightmare to manage them on a regular basis (restarts, code upgrades, status/health checking, etc).
After a short time of writing our loops in more organized ways we were able to generalize most of the loops code so now our loops look like a classes with a single mandatory public method called run. Everything else (spawning many workers, managing them, logging, backgrounding, pid-files management, etc) is handled by the plugin itself.
The major idea behind this small project was to create a deadly simple and yet robust framework to be able to run some tasks in background and do not think about spawning many workers, restarting them when they die, etc. So, if you need to be able to run either one or many copies of your worker or you do not want to think about re-spawning dead workers and do not want to spend megabytes of RAM on separate copies of Ruby interpreter (when you run each copy of your loop as a separate process controlled by monit/god/etc), then I’d recommend you to try this framework — you’ll like it.
For more information, visit the project site and, of course, read the sources
Few months ago I’ve switched one of our internal projects from doing synchronous database saves of analytics data to an asynchronous processing using starling + a pool of workers. This was the day when I really understood the power of specialized queue servers. I was using database (mostly, MySQL) for this kind of tasks for years and sometimes (especially under a highly concurrent load) it worked not so fast… Few times I worked with some queue servers, but those were either some small tasks or I didn’t have a time to really get the idea, that specialized queue servers were created just to do these tasks quickly and efficiently.
All this time (few months now) I was using starling noticed really bad thing in how it works: if workers die (really die, or lock on something for a long time, or just start lagging) and queue start growing, the thing could kill your server and you won’t be able to do something about it – it just eats all your memory and this is it. Since then I’ve started looking for a better solution for our queuing, the technology was too cool to give up. I’ve tried 5 or 6 different popular solutions and all of them sucked… They ALL had the same problem – if your queue grows, this is your problem and not queue broker’s :-/ The last solution I’ve tested was ActiveMQ and either I wasn’t able to push it to its limits or it is really so cool, but looks like it does not have this memory problem. So, we’ve started using it recently.
In this small post I’d like to describe a few things that took me pretty long to figure out in ruby Stomp client: how to make queues persistent (really!) and how to process elements one by one with clients’ acknowledgments.
Read the rest of this entry »