Today I was doing some work on one of our database servers (each of them has 4 SAS disks in RAID10 on an Adaptec controller) and it required huge multi-thread I/O-bound read load. Basically it was a set of parallel full-scan reads from a 300Gb compressed innodb table (yes, we use innodb plugin). Looking at the iostat I saw pretty expected results: 90-100% disk utilization and lots of read operations per second. Then I decided to play around with linux I/O schedulers and try to increase disk subsystem throughput. Here are the results:
How often do you think about the reasons why your favorite RDBMS sucks? 🙂 Last few months I was doing this quite often and yes, my favorite RDBMS is MySQL. The reason why I was thinking so because one of my recent tasks at Scribd was fixing scalability problems in documents browsing.
The problem with browsing was pretty simple to describe and as hard to fix – we have large data set which consists of a few tables with many fields with really bad selectivity (flag fields like is_deleted, is_private, etc; file_type, language_id , category_id and others). As the result of this situation it becomes really hard (if possible at all) to display documents lists like “most popular 1-10 pages PDF documents in Italian language from the category “Business” (of course, non-deleted, non-private, etc). If you’ll try to create appropriate indexes for each possible filters combination, you’ll end up having tens or hundreds of indexes and every INSERT query in your tables will take ages.
Since I wasn’t able to get to this year’s MySQL UC (employer change caused problems with US visa obtaining and I didn’t get visa in time) I’m really interested in all presentations people are posting after their sessions. I decided to collect them all in one place and would like to share with others – maybe someone will find it interesting to read what people have to say about many interesting aspects of MySQL usage.
So, I’ve created a folder in my Scribd.com account which you could use (and track using RSS readers) to find out what interesting presentations were published. You can use either my account or mysqluc08 folder there. One more possible option to track mysqluc presentations/documents is using our tagging (I tag all my docs with mysqluc08 tag).
Even though I didn’t go to MySQL conf this year (really sad about this), this week is gonna be most active in the community so I decided to do some community stuff too 🙂 Today I’ve released version 0.3 of our innodb recovery toolkit. Now it became much faster, stable and accurate. At this moment it is possible to recover almost any table from corrupted/deleted tablespace without so much effort as it was before. Here is a short changes list (since 0.1 announced here):
- More MySQL data types added: DECIMAL (both old and new), DATE, TIME
- CHAR data type handling improved in table definitions generator
- Indexes filtering added to page_parser
- 64-bit stat() support added to all tools
- Linux has no isnumber() function so we define our own implementation (pretty simple)
- Lots of fixes in create_defs.pl script – now it generates definitions which could recover your data in 80% cases w/o any changes.
- Min/max record size calculation fixed in constraints-based parser.
- Nullable fixed-size columns support is fixed.
- Debug logging is much cleaner now.
Last few days one of our customers (one of the largest Ruby on Rails sites on the Net) was struggling to solve some really strange problem – once upon a time they were getting an error from ActiveRecord on their site:
(ActiveRecord::StatementInvalid) "Mysql::Error: Lock wait timeout exceeded; try restarting transaction: UPDATE some_table.....
They have innodb_lock_wait_timeout set to 20 seconds. After a few hours of looking for strange transactions we were decided to create s script to dump SHOW INNODB STATUS and SHOW FULL PROCESSLIST commands output to a file every 10 seconds to catch one of those moments when this error occurred.
Today we’ve got next error and started digging in our logs…