Innodb Locks, ActiveRecord and acts_as_ferret Problem

Posted in: Databases, Development
Tags: ferret, MySQL, problem, rails, scalability

1 Dec2007

Last few days one of our customers (one of the largest Ruby on Rails sites on the Net) was struggling to solve some really strange problem – once upon a time they were getting an error from ActiveRecord on their site:

1	(ActiveRecord::StatementInvalid) "Mysql::Error: Lock wait timeout exceeded; try restarting transaction: UPDATE some_table.....

They have innodb_lock_wait_timeout set to 20 seconds. After a few hours of looking for strange transactions we were decided to create s script to dump SHOW INNODB STATUS and SHOW FULL PROCESSLIST commands output to a file every 10 seconds to catch one of those moments when this error occurred.

Today we’ve got next error and started digging in our logs…

After a couple of hours of looking in logs, parsing binary logs from MySQL and thinking we’ve found a following image of the situation:

1
2
3
4
5
6
7
8
9

# at 804649039
#071130 14:56:25 server id 4 end_log_pos 1011 Query thread_id=71318 exec_time=3 error_code=0

SET TIMESTAMP=1196456185/*!*/;
UPDATE some_table ....

# at 804650050
#071130 14:57:02 server id 4 end_log_pos 804650077 Xid = 220437663
COMMIT/*!*/;

This transaction was locking a record and caused all those errors. We’ve took a look at project sources and didn’t found any transactions used there at all. So, it was obvious this problem transaction (with one query in it) was generated by ActiveRecord.

We’ve discussed this problem with our customer and started searching for a reason, why would AR wait for 37 seconds between a query and COMMIT command.

At the end we’ve found a reason. Customer uses acts_as_ferret for search on their site. Site has a huge documents database so ferret index updates became pretty slow. In acts_as_ferret index updates are called from a following AR hooks:

1
2
3
4
5

...
after_create :ferret_create
after_update :ferret_update
after_destroy :ferret_destroy
...

Next step was obvious – we’ve created a test project with a simple test model:

1
2
3
4
5
6
7
8

class LockTest < ActiveRecord::Base
after_update :my_lock
def my_lock
logger.debug("Starting sleep for 10 sec")
sleep(10)
logger.debug("Wake up!")
end
end

Next – one test update of the data and the results in development.log:

1
2
3
4
5

SQL (0.000093) BEGIN
LockTest Update (0.000374) UPDATE lock_tests SET `created_at` = '2007-11-30 20:14:28', `updated_at` = '2007-11-30 20:14:43', `a` = 2 WHERE `id` = 1
Starting sleep for 10 sec
Wake up!
SQL (0.010107) COMMIT

So, maybe we were reading AR docs in a wrong way, or there is no such info, but :after_update callback is called BEFORE the actual data COMMITed to the DB! In our customer’s case, some user saved the document, ferret stated updating its indexes, record was locked for 30+ seconds and every updates to the same record (next tries to click to a Save button, for example) were failing with timeout errors.

So far so good, we’re going to move to Sphinx and hope there will be no such crazy errors in future.

Homo-Adminus Blog

Yet Another Admin’s Blog