Adding LZO Support to Cloudera Hadoop Distribution 4.3
13 Jun2013

Just a short note to myself and others who need to add LZO support for CDH 4.3.

First of all, you need to build hadoop-lzo. Since CDH 4.3 uses hadoop 2.0, most of the forks of hadoop-lzo project fail to compile against new libraries. After some digging I’ve found the original twitter hadoop-lzo branch to be the most maintained and it works perfectly with hadoop 2.0. So, download it, install pre-requisites, build it.

I have built it for us as an RPM, you can check out the spec file here (it depends on some other packages from that repo, but you should get the idea and should be able to modify the script to build on vanilla Redhat linux w/o additional packages). Another option would be to take a look at Cloudera’s GPL Extras repository and their lzo packages and documentation.

After you have built and installed your LZO libraries, you should be able to use them with HBase without any additional configuration. To test HBase support for LZO compression you could use the following command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/testfile lzo
13/06/13 04:43:14 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
13/06/13 04:43:14 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32
13/06/13 04:43:14 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C
13/06/13 04:43:14 DEBUG util.FSUtils: Creating file=file:/tmp/testfile with permission=rwxrwxrwx
13/06/13 04:43:15 ERROR metrics.SchemaMetrics: Inconsistent configuration. Previous configuration for using table name in metrics: true, new configuration: false
13/06/13 04:43:15 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/testfile. Expecting at least 5 path components.
13/06/13 04:43:15 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/06/13 04:43:15 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 64cec2e0439bd92a0a6bf3af28f5015a6836fc32]
13/06/13 04:43:15 INFO compress.CodecPool: Got brand-new compressor [.lzo_deflate]
13/06/13 04:43:15 DEBUG hfile.HFileWriterV2: Initialized with CacheConfig:disabled
13/06/13 04:43:15 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/testfile. Expecting at least 5 path components.
13/06/13 04:43:15 INFO compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
SUCCESS

You’re looking for that last line to say SUCCESS. If it fails, it means you did something wrong and it will tell you what that is.

Now, if you want to use LZO for map-reduce jobs, you need to make a few changes in your /etc/hadoop/conf/core-site.xml config file. If you manage your configuration yourself, just add the following to your configuration file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<property>
  <name>io.compression.codecs</name>
  <value>
    org.apache.hadoop.io.compress.DefaultCodec,
    org.apache.hadoop.io.compress.GzipCodec,
    org.apache.hadoop.io.compress.BZip2Codec,
    org.apache.hadoop.io.compress.DeflateCodec,
    org.apache.hadoop.io.compress.SnappyCodec,
    org.apache.hadoop.io.compress.Lz4Codec,
    com.hadoop.compression.lzo.LzoCodec,
    com.hadoop.compression.lzo.LzopCodec
  </value>
</property>

<property>
  <name>io.compression.codec.lzo.class</name>
  <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

If you’re managing your configuration with Cloudera Manager, you need to do the following:

  1. Go to your map-reduce service
  2. Click “Configuration” and select “View and Edit
  3. In the list on the left select “Gateway (Default)” and “Compression
  4. Add two items to the list of compression codecs: com.hadoop.compression.lzo.LzoCodec and com.hadoop.compression.lzo.LzoCodec
  5. Open “Service Wide” => “Advanced” in the list on the left
  6. Add the following configuration to your “MapReduce Service Configuration Safety Valve for mapred-site.xml” section:
    1
    2
    3
    4
    <property>
      <name>io.compression.codec.lzo.class</name>
      <value>com.hadoop.compression.lzo.LzoCodec</value>
    </property>
  7. Click “Save Changes
  8. Restart your map-reduce cluster with updated configuration

Now you should be able to use LZO in your map-reduce, hive and pig jobs.


New Chapter: Swiftype
31 Jan2013

So, after a few weeks of looking for a new job I’m really excited to start my journey in a young, but very ambitious startup called Swiftype which is focused on developing a technology for private site search, that could be used on everything from small blogs to large product sites. The company is growing really fast and I’m going to lead all the work on infrastructure, build the ops team and hope to get a chance to do some coding along the way.

Stay tuned – I really hope to finally get a chance to do more blogging this year. 🙂


Looking for a New Gig
14 Jan2013

As of today I’m no longer working for LivingSocial and I’m looking for the next thing to work on. Since my family is in Toronto and I have an apartment (mortgage) here, I’m not looking to relocate and currently looking for something remote (I have 7+ years of remote work experience) or something local in Toronto.

For more information on my background, please check my Github profile, my linkedin profile or the resume section on this blog. If you need to contact me, feel free to use any channels listed on the contacts page.

Update: After a few initial interviews I’d like to update this post with a bit more details on what I’m looking for in the new position.

First of all, I’m really not sure I want to be yet another ops engineer working on “everything ops” in my next company. If I’d be to join a company as a regular ops engineer, I’d prefer it to be a clearly defined role with a clear focus on some set of challenging problems. I’m honestly tired of setting up cacti/nagios/chef at this point and would like the job to be a little bit more challenging.

Though even more I’m interested in being able to make strategic technical decisions for an operations team and apply my experience and knowledge for solving challenging tasks with a dedicated team of ops engineers. This could be anything from a tech ops team lead role (in a medium/large companies) to a director of technical operations (in a small-to-medium sized startups).

Update: Ok, I’ve found a new job – I work for Swiftype now!


Momentum MTA Performance Tuning Tips
7 Jan2012

This post is being constantly updated as we find out more useful information on Momentum tuning. Last update: 2012-05-05.

About 2 months ago I’ve joined LivingSocial technical operations team and one of my first tasks there was to figure out a way to make our MTAs perform better and deliver faster. We use a really great product called Momentum MTA (former Ecelerity) and it is really fast, but it is always good to be able to squeeze as much performance as possible so I’ve started looking for a ways to make our system faster.

While working on it I’ve created a set of scripts to integrate Momentum with Graphite for all kinds of crazy stats graphing, those scripts will be opensourced soon, but for now I’ve decided to share a few tips about performance-related changes we’ve made to improve our performance at least 2x:

Read the rest of this entry


DbCharmer 1.7.0 Release: Rails 3.0 Support and Forced Slave Reads
1 Sep2011

This week, after 3 months in the works, we’ve finally released version 1.7.0 of DbCharmer ruby gem – Rails plugin that significantly extends ActiveRecord’s ability to work with multiple databases and/or database servers by adding features like multiple databases support, master/slave topologies support, sharding, etc.

New features in this release:

  • Rails 3.0 support. We’ve worked really hard to bring all the features we supported in Rails 2.X to the new version of Rails and now I’m proud that we’ve implemented them all and the implementation looks much cleaner and more universal (all kinds of relations in rails 3 work in exactly the same way and we do not need to implement connection switching for all kinds of weird corner-cases in ActiveRecord).
  • Forced Slave Reads functionality. Now we could have models with slaves that are not used by default, but could be turned on globally (per-controller, per-action or in a block). This is a new feature that brings our master/slave routing capabilities to a really new level – we could now use it for a really mission-critical models on demand and not be afraid of breaking major functionality of our applications by switching them to slave reads.
  • Lots of changes were made in the structure of our code and tests to make sure it would be much easier for new developers to understand DbCharmer internals and make changes in its code.

Along with the new release we’ve got a brand new web site. You can find much better, cleaner and, most importantly, correct documentation for the library on the web site. We’ll be adding more examples, will try to add more in-depth explanation of our core functions, etc.

If you have any questions about the release, feel free to ask them in our new mailing list: DbCharmer Users Group.

For more updates on our releases, you can follow @DbCharmer on Twitter.