Bounces-handler Released
3 Aug2008

Today I’ve managed to finish initial version of our bounces-handler package we use for mailing-related stuff in Scribd.

Bounces-handler package is a simple set of scripts to automatically process email bounces and ISP‘s feedback loops emails, maintain your mailing blacklists and a Rails plugin to use those blacklists in your RoR applications.

This piece of software has been developed as a part of more global work on mailing quality improvement in Scribd.com, but it was one of the most critical steps after setting up reverse DNS records, DKIM and SPF.

The package itself consists of two parts:

  • Perl scripts to process incoming email:
    • bounces processor — could be assigned to process all your bounce emails
    • feedback loops messages processor — more specific for Scribd, but still – could be modified for your needs (will be released soon).
  • Rails plugin to work with mailing blacklists

For more information, please check our README file. If you have any questions, comments or suggestions, please leave them here as a comments and I’ll try to reply as soon as possible.


Found an Ideal I/O Scheduler for my MySQL boxes
20 Jul2008

Today I was doing some work on one of our database servers (each of them has 4 SAS disks in RAID10 on an Adaptec controller) and it required huge multi-thread I/O-bound read load. Basically it was a set of parallel full-scan reads from a 300Gb compressed innodb table (yes, we use innodb plugin). Looking at the iostat I saw pretty expected results: 90-100% disk utilization and lots of read operations per second. Then I decided to play around with linux I/O schedulers and try to increase disk subsystem throughput. Here are the results:

Read the rest of this entry


Using Sphinx for Non-Fulltext Queries
19 May2008

How often do you think about the reasons why your favorite RDBMS sucks? 🙂 Last few months I was doing this quite often and yes, my favorite RDBMS is MySQL. The reason why I was thinking so because one of my recent tasks at Scribd was fixing scalability problems in documents browsing.

The problem with browsing was pretty simple to describe and as hard to fix – we have large data set which consists of a few tables with many fields with really bad selectivity (flag fields like is_deleted, is_private, etc; file_type, language_id , category_id and others). As the result of this situation it becomes really hard (if possible at all) to display documents lists like “most popular 1-10 pages PDF documents in Italian language from the category “Business” (of course, non-deleted, non-private, etc). If you’ll try to create appropriate indexes for each possible filters combination, you’ll end up having tens or hundreds of indexes and every INSERT query in your tables will take ages.

Read the rest of this entry


Command Line History
28 Apr2008

Inspired by the Rail Spikes:

1
2
3
4
5
6
7
8
9
10
11
12
bash-3.2$ history 1000 | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head
228 cd
167 git
10 ssh
10 DEPLOY=production
6 sudo
6 pwd
6 ./script/import_views.rb
5 rm
4 rake
4 mv
bash-3.2$

Really interesting stats, I’d never guess that git is used more than ssh on my desktop (I’m a remote worker and mysql consultant so I ssh really often). 🙂


MySQL UC 2008 Presentations
18 Apr2008

Since I wasn’t able to get to this year’s MySQL UC (employer change caused problems with US visa obtaining and I didn’t get visa in time) I’m really interested in all presentations people are posting after their sessions. I decided to collect them all in one place and would like to share with others – maybe someone will find it interesting to read what people have to say about many interesting aspects of MySQL usage.

So, I’ve created a folder in my Scribd.com account which you could use (and track using RSS readers) to find out what interesting presentations were published. You can use either my account or mysqluc08 folder there. One more possible option to track mysqluc presentations/documents is using our tagging (I tag all my docs with mysqluc08 tag).