This article has been originally posted on Swiftype Engineering blog.
For any modern technology company, a comprehensive application test suite is an absolute necessity. Automated testing suites allow developers to move faster while avoiding any loss of code quality or system stability. Software development has seen great benefit come from the adoption of automated testing frameworks and methodologies, however, the culture of automated testing has neglected one key area of modern web application serving stack: web application edge routing and multiplexing rulesets.
From modern load balancer appliances that allow for TCL based rule sets; local or remotely hosted varnish VCL rules; or in the power and flexibility that Nginx and OpenResty make available through LUA, edge routing rulesets have become a vital part of application serving controls.
Over the past decade or so, it has become possible to incorporate more and more logic into edge web server infrastructures. Almost every modern web server has support for scripting, enabling developers to make their edge servers smarter than ever before. Unfortunately, the application logic configured within web servers is often much harder to test than that hosted directly in application code, and thus too often software teams resort to manual testing, or worse, customers as testers, by shipping their changes to production without edge routing testing having been performed.
In this post, I would like to explain the approach Swiftype has taken to ensure that our test suites account for our use of complex edge web server logic
to manage our production traffic flow, and thus that we can confidently deploy changes to our application infrastructure with little or no risk.
Read the rest of this entry →
As a leader of a technical operations team I often have to work on technical operations engineer hiring. This process involves a lot of interviews with candidates and during those interviews along with many challenging practical questions I really love to ask questions like “What are the most important resources you think an Operations Engineer should follow?”, “What books in your opinion are must-read for a techops engineer?” or “Who are your personal heroes in IT community?”. Those questions often give me a lot of information about candidates, their experience, who they are looking up to in the community, what they are interested in, and if they are actively working on improving their professional level.
Recently, one of the candidates asked me to share my lists with him and I thought this information could be valuable to other people so I have decided to share it here on my blog.
Read the rest of this entry →
- Posted in: Admin-tips, Databases, Development
- Tags: cloudera, custom, hadoop, hive, jar, java, json, serde, udf
26 Jul2013
Yet another small note about Cloudera Hadoop Distribution 4.3.
This time I needed to deploy some custom JAR files to our Hive cluster so that we wouldn’t need to do “ADD JAR
” commands in every Hive job (especially useful when using HiveServer API).
Here is the process of adding a custom SerDE or a UDF jar to your Cloudera Hadoop cluster:
- First, we have built our JSON SerDe and got a
json-serde-1.1.6.jar
file.
- To make this file available to Hive CLI tools, we need to copy it to
/usr/lib/hive/lib
on every server in the cluster (I have prepared an rpm package to do just that).
- To make sure Hive map-reduce jobs would be able to read/write JSON tables, we needed to copy our JAR file to
/usr/lib/hadoop/lib
directory on all task tracker servers in the cluster (the same rpm does that).
- And last, really important step: To make sure your TaskTracker servers know about the new jar, you need to restart your tasktracker services (we use Cloudera Manager, so that was just a few mouse clicks ;-))
And this is it for today.
Today, just like many times before, I needed to configure a monitoring server for MySQL using Cacti and awesome Percona Monitoring Templates. The only difference was that this time I wanted to get it to run with 1 min resolution (using ganglia and graphite, both with 10 sec resolution, for all the rest of our monitoring in Swiftype really spoiled me!). And that’s where the usual pain in the ass Cacti configuration gets really amplified by the million things you need to change to make it work. So, this is a short checklist post for those who need to configure a Cacti server with 1 minute resolution and setup Percona Monitoring Plugins on it.
Read the rest of this entry →
Just a short note to myself and others who need to add LZO support for CDH 4.3.
First of all, you need to build hadoop-lzo. Since CDH 4.3 uses hadoop 2.0, most of the forks of hadoop-lzo project fail to compile against new libraries. After some digging I’ve found the original twitter hadoop-lzo branch to be the most maintained and it works perfectly with hadoop 2.0. So, download it, install pre-requisites, build it.
I have built it for us as an RPM, you can check out the spec file here (it depends on some other packages from that repo, but you should get the idea and should be able to modify the script to build on vanilla Redhat linux w/o additional packages). Another option would be to take a look at Cloudera’s GPL Extras repository and their lzo packages and documentation.
After you have built and installed your LZO libraries, you should be able to use them with HBase without any additional configuration. To test HBase support for LZO compression you could use the following command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| $ hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/testfile lzo
13/06/13 04:43:14 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
13/06/13 04:43:14 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32
13/06/13 04:43:14 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C
13/06/13 04:43:14 DEBUG util.FSUtils: Creating file=file:/tmp/testfile with permission=rwxrwxrwx
13/06/13 04:43:15 ERROR metrics.SchemaMetrics: Inconsistent configuration. Previous configuration for using table name in metrics: true, new configuration: false
13/06/13 04:43:15 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/testfile. Expecting at least 5 path components.
13/06/13 04:43:15 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/06/13 04:43:15 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 64cec2e0439bd92a0a6bf3af28f5015a6836fc32]
13/06/13 04:43:15 INFO compress.CodecPool: Got brand-new compressor [.lzo_deflate]
13/06/13 04:43:15 DEBUG hfile.HFileWriterV2: Initialized with CacheConfig:disabled
13/06/13 04:43:15 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/testfile. Expecting at least 5 path components.
13/06/13 04:43:15 INFO compress.CodecPool: Got brand-new decompressor [.lzo_deflate]
SUCCESS |
You’re looking for that last line to say SUCCESS
. If it fails, it means you did something wrong and it will tell you what that is.
Now, if you want to use LZO for map-reduce jobs, you need to make a few changes in your /etc/hadoop/conf/core-site.xml
config file. If you manage your configuration yourself, just add the following to your configuration file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| <property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.DeflateCodec,
org.apache.hadoop.io.compress.SnappyCodec,
org.apache.hadoop.io.compress.Lz4Codec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property> |
If you’re managing your configuration with Cloudera Manager, you need to do the following:
- Go to your map-reduce service
- Click “Configuration” and select “View and Edit“
- In the list on the left select “Gateway (Default)” and “Compression“
- Add two items to the list of compression codecs:
com.hadoop.compression.lzo.LzoCodec
and com.hadoop.compression.lzo.LzoCodec
- Open “Service Wide” => “Advanced” in the list on the left
- Add the following configuration to your “MapReduce Service Configuration Safety Valve for mapred-site.xml” section:
1 2 3 4
| <property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property> |
- Click “Save Changes“
- Restart your map-reduce cluster with updated configuration
Now you should be able to use LZO in your map-reduce, hive and pig jobs.