Category: Development
Using Sphinx for Non-Fulltext Queries
19 May2008

How often do you think about the reasons why your favorite RDBMS sucks? 🙂 Last few months I was doing this quite often and yes, my favorite RDBMS is MySQL. The reason why I was thinking so because one of my recent tasks at Scribd was fixing scalability problems in documents browsing.

The problem with browsing was pretty simple to describe and as hard to fix – we have large data set which consists of a few tables with many fields with really bad selectivity (flag fields like is_deleted, is_private, etc; file_type, language_id , category_id and others). As the result of this situation it becomes really hard (if possible at all) to display documents lists like “most popular 1-10 pages PDF documents in Italian language from the category “Business” (of course, non-deleted, non-private, etc). If you’ll try to create appropriate indexes for each possible filters combination, you’ll end up having tens or hundreds of indexes and every INSERT query in your tables will take ages.

Read the rest of this entry


Command Line History
28 Apr2008

Inspired by the Rail Spikes:

1
2
3
4
5
6
7
8
9
10
11
12
bash-3.2$ history 1000 | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head
228 cd
167 git
10 ssh
10 DEPLOY=production
6 sudo
6 pwd
6 ./script/import_views.rb
5 rm
4 rake
4 mv
bash-3.2$

Really interesting stats, I’d never guess that git is used more than ssh on my desktop (I’m a remote worker and mysql consultant so I ssh really often). 🙂


InnoDB Recovery toolset Version 0.3 Released
14 Apr2008

Even though I didn’t go to MySQL conf this year (really sad about this), this week is gonna be most active in the community so I decided to do some community stuff too 🙂 Today I’ve released version 0.3 of our innodb recovery toolkit. Now it became much faster, stable and accurate. At this moment it is possible to recover almost any table from corrupted/deleted tablespace without so much effort as it was before. Here is a short changes list (since 0.1 announced here):

  • More MySQL data types added: DECIMAL (both old and new), DATE, TIME
  • CHAR data type handling improved in table definitions generator
  • Indexes filtering added to page_parser
  • 64-bit stat() support added to all tools
  • Linux has no isnumber() function so we define our own implementation (pretty simple)
  • Lots of fixes in create_defs.pl script – now it generates definitions which could recover your data in 80% cases w/o any changes.
  • Min/max record size calculation fixed in constraints-based parser.
  • Nullable fixed-size columns support is fixed.
  • Debug logging is much cleaner now.

As always, if you need any help with your recovery, we would love to help.


Dog-pile Effect and How to Avoid it with Ruby on Rails memcache-client Patch
10 Mar2008

We were using memcache in our application for a long time and it helped a lot to reduce DB servers load on some huge queries. But there was a problem (sometimes called a “dog-pile effect”) – when some cached value was expired and we had a huge traffic, sometimes too many threads in our application were trying to calculate new value to cache it.

For example, if you have some simple but really bad query like

1
SELECT COUNT(*) FROM some_table WHERE some_flag = X

which could be really slow on a huge tables, and your cache expires, then ALL your clients calling a page with this counter will end up waiting for this counter to be updated. Sometimes there could be tens or even hundreds of such a queries running on your DB killing your server and breaking an entire application (number of application instances is constant, but more and more instances are locked waiting for a counter).

Read the rest of this entry


FastSessions Rails Plugin Released
6 Feb2008

How often do we think about our http sessions implementation? I mean, do you know, how your currently used sessions-related code will behave when sessions number in your database will grow up to millions (or, even, hundreds of millions) of records? This is one of the things we do not think about. But if you’ll think about it, you’ll notice, that 99% of your session-related operations are read-only and 99% of your sessions writes are not needed. Almost all your sessions table records have the same information: session_id and serialized empty session in the data field.

Looking at this sessions-related situation we have created really simple (and, at the same time, really useful for large Rails projects) plugin, which replaces ActiveRecord-based session store and makes sessions much more effective. Below you can find some information about implementation details and decisions we’ve made in this plugin, but if you just want to try it, then check out our project site.

Read the rest of this entry