About 6 years ago (feels like an eternity in Rails world) working at Scribd I’ve started working on porting our codebase from some old version or Rails to a slightly newer one. That’s when I realized, that there wasn’t a ruby gem to help us manage MySQL connections for our vertically sharded databases (different models on different servers). I’ve started hacking on some code to replace whatever we were using back then, finished the first version of the migration branch and then decided to open the code for other people to use. That’s how the DbCharmer ruby gem was born.
For the next few years a lot of new functionality we needed has been added to the gem, making it more complex and immensely more powerful. I’ve enjoyed working on it, developing those features, contributing to the community. But then I left Scribd, stopped being a user of DbCharmer and the situation drastically changed. For quite some time (years) I would keep fighting to make the code work with newer and newer versions of Rails, struggling to wrap my head around more and more (sometimes useless) abstractions Rails Core team decided to throw into ActiveRecord.
Finally, in the last 2 years (while trying to make DbCharmer compatible with Rails 4.0) it has become more and more apparent, that I simply do not want to do this anymore. I do not need DbCharmer to support Rails 4.0+, while it is very clear that many users need it and constant nagging in the issues and the mailing list, asking for updates generated a lot of anxiety for me, anxiety I couldn’t do much about (the worst kind). As the result, since I simply do not see any good reasons to keep fighting this uphill battle (and developing stuff like this for ActiveRecord IS a constant battle!) I officially give up.
Read the rest of this entry →
- Posted in: Admin-tips, Databases, Development
- Tags: cloudera, custom, hadoop, hive, jar, java, json, serde, udf
Yet another small note about Cloudera Hadoop Distribution 4.3.
This time I needed to deploy some custom JAR files to our Hive cluster so that we wouldn’t need to do “
ADD JAR” commands in every Hive job (especially useful when using HiveServer API).
Here is the process of adding a custom SerDE or a UDF jar to your Cloudera Hadoop cluster:
- First, we have built our JSON SerDe and got a
- To make this file available to Hive CLI tools, we need to copy it to
/usr/lib/hive/lib on every server in the cluster (I have prepared an rpm package to do just that).
- To make sure Hive map-reduce jobs would be able to read/write JSON tables, we needed to copy our JAR file to
/usr/lib/hadoop/lib directory on all task tracker servers in the cluster (the same rpm does that).
- And last, really important step: To make sure your TaskTracker servers know about the new jar, you need to restart your tasktracker services (we use Cloudera Manager, so that was just a few mouse clicks ;-))
And this is it for today.
This week, after 3 months in the works, we’ve finally released version 1.7.0 of DbCharmer ruby gem – Rails plugin that significantly extends ActiveRecord’s ability to work with multiple databases and/or database servers by adding features like multiple databases support, master/slave topologies support, sharding, etc.
New features in this release:
- Rails 3.0 support. We’ve worked really hard to bring all the features we supported in Rails 2.X to the new version of Rails and now I’m proud that we’ve implemented them all and the implementation looks much cleaner and more universal (all kinds of relations in rails 3 work in exactly the same way and we do not need to implement connection switching for all kinds of weird corner-cases in ActiveRecord).
- Forced Slave Reads functionality. Now we could have models with slaves that are not used by default, but could be turned on globally (per-controller, per-action or in a block). This is a new feature that brings our master/slave routing capabilities to a really new level – we could now use it for a really mission-critical models on demand and not be afraid of breaking major functionality of our applications by switching them to slave reads.
- Lots of changes were made in the structure of our code and tests to make sure it would be much easier for new developers to understand DbCharmer internals and make changes in its code.
Along with the new release we’ve got a brand new web site. You can find much better, cleaner and, most importantly, correct documentation for the library on the web site. We’ll be adding more examples, will try to add more in-depth explanation of our core functions, etc.
If you have any questions about the release, feel free to ask them in our new mailing list: DbCharmer Users Group.
For more updates on our releases, you can follow @DbCharmer on Twitter.
We use nginx and its features a lot in Scribd. Many times in the last year we needed some pretty interesting, but not supported feature – we wanted nginx X-Accel-Redirect functionality to work with remote URLs. Out of the box nginx supports this functionality for local URIs only. In this short post I want to explain how did we make nginx serve remote content via X-Accel-Redirect.
Read the rest of this entry →
Having a reverse-proxy web cache as one of the major infrastructure elements brings many benefits for large web applications: it reduces your application servers load, reduces average response times on your site, etc. But there is one problem every developer experiences when works with such a cache – cached content invalidation.
It is a complex problem that usually consists of two smaller ones: individual cache elements invalidation (you need to keep an eye on your data changes and invalidate cached pages when related data changes) and full cache purges (sometimes your site layout or page templates change and you need to purge all the cached pages to make sure users will get new visual elements of layout changes). In this post I’d like to look at a few techniques we use at Scribd to solve cache invalidation problems.
Read the rest of this entry →