Innodb Locks, ActiveRecord and acts_as_ferret Problem

Posted by Oleksiy Kovyrin under Databases, Development

Last few days one of our customers (one of the largest Ruby on Rails sites on the Net) was struggling to solve some really strange problem – once upon a time they were getting an error from ActiveRecord on their site:

1
(ActiveRecord::StatementInvalid) "Mysql::Error: Lock wait timeout exceeded; try restarting transaction: UPDATE some_table.....

They have innodb_lock_wait_timeout set to 20 seconds. After a few hours of looking for strange transactions we were decided to create s script to dump SHOW INNODB STATUS and SHOW FULL PROCESSLIST commands output to a file every 10 seconds to catch one of those moments when this error occurred.

Today we’ve got next error and started digging in our logs…

After a couple of hours of looking in logs, parsing binary logs from MySQL and thinking we’ve found a following image of the situation:

1
2
3
4
5
6
7
8
9
# at 804649039                                                                                                                              
#071130 14:56:25 server id 4  end_log_pos 1011  Query   thread_id=71318 exec_time=3     error_code=0

SET TIMESTAMP=1196456185/*!*/;    
UPDATE some_table ....

# at 804650050                                                                                                                              
#071130 14:57:02 server id 4  end_log_pos 804650077     Xid = 220437663                                                                    
COMMIT/*!*/;

This transaction was locking a record and caused all those errors. We’ve took a look at project sources and didn’t found any transactions used there at all. So, it was obvious this problem transaction (with one query in it) was generated by ActiveRecord.

We’ve discussed this problem with our customer and started searching for a reason, why would AR wait for 37 seconds between a query and COMMIT command.

At the end we’ve found a reason. Customer uses acts_as_ferret for search on their site. Site has a huge documents database so ferret index updates became pretty slow. In acts_as_ferret index updates are called from a following AR hooks:

1
2
3
4
5
...
after_create  :ferret_create
after_update  :ferret_update
after_destroy :ferret_destroy
...

Next step was obvious – we’ve created a test project with a simple test model:

1
2
3
4
5
6
7
8
class LockTest < ActiveRecord::Base
 after_update :my_lock
 def my_lock
   logger.debug("Starting sleep for 10 sec")
   sleep(10)
   logger.debug("Wake up!")
 end
end

Next – one test update of the data and the results in development.log:

1
2
3
4
5
SQL (0.000093)   BEGIN
 LockTest Update (0.000374)   UPDATE lock_tests SET `created_at` = '2007-11-30 20:14:28', `updated_at` = '2007-11-30 20:14:43', `a` = 2 WHERE `id` = 1
Starting sleep for 10 sec
Wake up!
 SQL (0.010107)   COMMIT

So, maybe we were reading AR docs in a wrong way, or there is no such info, but :after_update callback is called BEFORE the actual data COMMITed to the DB! In our customer's case, some user saved the document, ferret stated updating its indexes, record was locked for 30+ seconds and every updates to the same record (next tries to click to a Save button, for example) were failing with timeout errors.

So far so good, we're going to move to Sphinx and hope there will be no such crazy errors in future.


Related posts:

  1. InnoDB Recovery toolset Version 0.3 Released
  2. Data Recovery Toolkit for InnoDB Released
  3. DB Charmer – ActiveRecord Connection Magic Plugin
  4. Small Tip: How to Enable ActiveRecord Logging in Merb
  5. Dog-pile Effect and How to Avoid it with Ruby on Rails memcache-client Patch

8 Responses to this entry

Scoundrel says:

Frank,

AFAIU, they simply did not hear about nginx back then and started using what everyone else in Rails community was using – ferret. Later many people I know migrated away from ferret, but when our client was creating his service, ferret was one of the most popular SE solutions which have Ruby client libraries and/or Rails plugins.

Scoundrel says:

At this point we’ve just decided to stick to the technology we’ve used to. I mean both customer and me use sphinx in many places and sphinx integration process looks much easier for us at this moment.

Michael Klishin says:

Nice. Has it been filed to Rails Trac? I’m actually digging AR sources and trying to help core team with fixes. So should I create ticket for this first or can you please point me to it?

There’s about 10 000 tickets in their Trac :(

Scoundrel says:

2Michael: I’ve never filled it as a bug, cause’ I wasn’t sure that it is a bug. Feel free to file it in rails trac if you think it is.