Category: Admin-tips
Interesting Resources for Technical Operations Engineers
23 Sep2013

As you may have already heard, I am looking for good techops engineers to join my team at Swiftype. This process involves a lot of interviews with candidates and during those interviews along with many challenging practical questions I really love to ask questions like “What are the most important resources you think an Operations Engineer should follow?” or “What books in your opinion are must-read for a techops?” or “Who are your personal heroes in IT community?”. Those questions often give me a lot of information about candidates, their experience, who they are looking up to in the community, what they are interested in, and if they are actively working on improving their professional level.

Recently, one of the candidates asked me to share my lists with him and I thought this information could be valuable to other people so I have decided to share it here on my blog.


Must-Read Books List

First of all, I would like to share a list of books I believe every professional in our field should read at some point in their life. You may notice that many of these books are not too technical or are not really related to the pure systems administration part of a techops job. I still think those are very important because technical operations work on senior levels involves much more than just making sure things work as expected. A lot of it involves time management, crisis management and many other topics that are equally important for a professional in this field.

So, here is the list (with not particular ordering, grouped by topics):

Systems and Networks Administration

Technical Operations, Architecture, Scalability

Project, Release and Time Management

Other

For more information on interesting books for technical operations engineers, you can check out the following book lists on GoodReads:


Interesting Conferences

Conferences, in my opinion, are an essential part in professional development of any engineer. Here is a list of conferences that could be useful for techops engineers:

  • Surge Conference – in my opinion, this is definitely one of the best conferences dedicated to building and maintaining large web architectures. If I were to choose one conference a year to go to, it would definitely be Surge. Videos from previous years are freely available online: 2010, 2011, 2012. 2013 videos should be available soon as well.
  • Oreilly’s Velocity Conference – biggest and, probably, the oldest web operations and web performance event. In my opinion, recently it became too focused on web frontend performance, though it is still a really interesting event. Complete video compilations from the conference are available for sale: 2011, 2012, 2013.
  • Monitorama Conference – pretty new, but already very popular conference with interesting content for everyone interested in monitoring (which most ops engineers are). Sides and videos from the first ever Monitorama conference in 2013 are available online.
  • Percona Live Conference – really awesome event for anybody who has MySQL in their stack. Huge multi-track event with talks from the best and brightest people in MySQL community. Slides and keynote videos from 2013 event are available online.
  • DevOps Days – small events happening all around the world and becoming more and more popular. The major topic of these conferences is the DevOps movement, related team/project management practices, etc. Videos and slides from some of the events are available online.

Even if you do not have time to watch any of those conference videos, I think every operations engineer out there would really enjoy 2011 Surge Conference closing plenary session video where Theo Schlossnagle (one of my personal heroes in IT community) described a typical debugging session many of us go through every once in a while:


Interesting Web Resources

And last, but certainly not least, I would like to share a list of web resources I like to follow to stay up to date on the most recent news and fresh ideas within the web operations community and related areas:

Leading Industry Sites and Blogs

  • MySQL Performance Blog from Percona – one of the best resources on MySQL performance
  • High Scalability – awesome resource with a lot of great articles on scalability, performance and design of large scale systems
  • Kitchen Soap – Blog by John Alspaw (another of my personal heroes in IT field)
  • DevOps Community Planet – feed/news aggregator for the DevOps community
  • DevOps Community on Reddit – not too active, but still a useful resource for getting interesting news
  • Agile Sysadmin – Blog of Stephen Nelson-Smith
  • obfuscurity – Blog by Jason Dixon, maintainer of Graphite, author of Descartes, Tasseo and other useful tools for metrics collection and displaying
  • The Agile Admin – Many interesting thoughts on agile web operations and devops
  • Operation Bootstrap – Blog of Aaron Nichols talking about many different aspects of working in operations

Engineering Blogs of Large Web Companies

Podcasts

  • Changelog – member-supported podcast on 5by5 network talking about interesting open source projects
  • Food Fight – bi-weekly podcast for Chef community
  • DevOps Cafe – interviews with interesting members of DevOps community
  • The Ship Show – twice-monthly podcast, featuring discussion on everything from build engineering to DevOps to release management, plus interviews, new tools and techniques, and reviews

And this is it! I hope these lists would be useful for young engineers going into the technical operations and for people who already work in this space. I am going to try to regularly update this post in the future to make sure it stays relevant for a long time.

P.S. Once again, if you are looking for a job in technical operations, please consider joining my team at Swiftype!


Adding Custom Hive SerDe and UDF Libraries to Cloudera Hadoop 4.3
26 Jul2013

Yet another small note about Cloudera Hadoop Distribution 4.3.

This time I needed to deploy some custom JAR files to our Hive cluster so that we wouldn’t need to do “ADD JAR” commands in every Hive job (especially useful when using HiveServer API).

Here is the process of adding a custom SerDE or a UDF jar to your Cloudera Hadoop cluster:

  • First, we have built our JSON SerDe and got a json-serde-1.1.6.jar file.
  • To make this file available to Hive CLI tools, we need to copy it to /usr/lib/hive/lib on every server in the cluster (I have prepared an rpm package to do just that).
  • To make sure Hive map-reduce jobs would be able to read/write JSON tables, we needed to copy our JAR file to /usr/lib/hadoop/lib directory on all task tracker servers in the cluster (the same rpm does that).
  • And last, really important step: To make sure your TaskTracker servers know about the new jar, you need to restart your tasktracker services (we use Cloudera Manager, so that was just a few mouse clicks ;-))

And this is it for today.


MySQL Monitoring With Cacti Using Percona Monitoring Plugins (1-minute resolution)
26 Jun2013

Today, just like many times before, I needed to configure a monitoring server for MySQL using Cacti and awesome Percona Monitoring Templates. The only difference was that this time I wanted to get it to run with 1 min resolution (using ganglia and graphite, both with 10 sec resolution, for all the rest of our monitoring in Swiftype really spoiled me!). And that’s where the usual pain in the ass Cacti configuration gets really amplified by the million things you need to change to make it work. So, this is a short checklist post for those who need to configure a Cacti server with 1 minute resolution and setup Percona Monitoring Plugins on it.

Read the rest of this entry


Adding LZO Support to Cloudera Hadoop Distribution 4.3
13 Jun2013

Just a short note to myself and others who need to add LZO support for CDH 4.3.

First of all, you need to build hadoop-lzo. Since CDH 4.3 uses hadoop 2.0, most of the forks of hadoop-lzo project fail to compile against new libraries. After some digging I’ve found the original twitter hadoop-lzo branch to be the most maintained and it works perfectly with hadoop 2.0. So, download it, install pre-requisites, build it.

I have built it for us as an RPM, you can check out the spec file here (it depends on some other packages from that repo, but you should get the idea and should be able to modify the script to build on vanilla Redhat linux w/o additional packages). Another option would be to take a look at Cloudera’s GPL Extras repository and their lzo packages and documentation.

After you have built and installed your LZO libraries, you should be able to use them with HBase without any additional configuration. To test HBase support for LZO compression you could use the following command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/testfile lzo
13/06/13 04:43:14 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
13/06/13 04:43:14 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32
13/06/13 04:43:14 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C
13/06/13 04:43:14 DEBUG util.FSUtils: Creating file=file:/tmp/testfile with permission=rwxrwxrwx
13/06/13 04:43:15 ERROR metrics.SchemaMetrics: Inconsistent configuration. Previous configuration for using table name in metrics: true, new configuration: false
13/06/13 04:43:15 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/testfile. Expecting at least 5 path components.
13/06/13 04:43:15 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/06/13 04:43:15 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 64cec2e0439bd92a0a6bf3af28f5015a6836fc32]
13/06/13 04:43:15 INFO compress.CodecPool: Got brand-new compressor [.lzo_deflate]
13/06/13 04:43:15 DEBUG hfile.HFileWriterV2: Initialized with CacheConfig:disabled
13/06/13 04:43:15 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/testfile. Expecting at least 5 path components.
13/06/13 04:43:15 INFO compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
SUCCESS

You’re looking for that last line to say SUCCESS. If it fails, it means you did something wrong and it will tell you what that is.

Now, if you want to use LZO for map-reduce jobs, you need to make a few changes in your /etc/hadoop/conf/core-site.xml config file. If you manage your configuration yourself, just add the following to your configuration file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<property>
  <name>io.compression.codecs</name>
  <value>
    org.apache.hadoop.io.compress.DefaultCodec,
    org.apache.hadoop.io.compress.GzipCodec,
    org.apache.hadoop.io.compress.BZip2Codec,
    org.apache.hadoop.io.compress.DeflateCodec,
    org.apache.hadoop.io.compress.SnappyCodec,
    org.apache.hadoop.io.compress.Lz4Codec,
    com.hadoop.compression.lzo.LzoCodec,
    com.hadoop.compression.lzo.LzopCodec
  </value>
</property>

<property>
  <name>io.compression.codec.lzo.class</name>
  <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

If you’re managing your configuration with Cloudera Manager, you need to do the following:

  1. Go to your map-reduce service
  2. Click “Configuration” and select “View and Edit
  3. In the list on the left select “Gateway (Default)” and “Compression
  4. Add two items to the list of compression codecs: com.hadoop.compression.lzo.LzoCodec and com.hadoop.compression.lzo.LzoCodec
  5. Open “Service Wide” => “Advanced” in the list on the left
  6. Add the following configuration to your “MapReduce Service Configuration Safety Valve for mapred-site.xml” section:
    1
    2
    3
    4
    <property>
      <name>io.compression.codec.lzo.class</name>
      <value>com.hadoop.compression.lzo.LzoCodec</value>
    </property>
  7. Click “Save Changes
  8. Restart your map-reduce cluster with updated configuration

Now you should be able to use LZO in your map-reduce, hive and pig jobs.


Momentum MTA Performance Tuning Tips
7 Jan2012

This post is being constantly updated as we find out more useful information on Momentum tuning. Last update: 2012-05-05.

About 2 months ago I’ve joined LivingSocial technical operations team and one of my first tasks there was to figure out a way to make our MTAs perform better and deliver faster. We use a really great product called Momentum MTA (former Ecelerity) and it is really fast, but it is always good to be able to squeeze as much performance as possible so I’ve started looking for a ways to make our system faster.

While working on it I’ve created a set of scripts to integrate Momentum with Graphite for all kinds of crazy stats graphing, those scripts will be opensourced soon, but for now I’ve decided to share a few tips about performance-related changes we’ve made to improve our performance at least 2x:

Read the rest of this entry