- Posted in: Admin-tips
Just a short note to myself and others who need to add LZO support for CDH 4.3.
First of all, you need to build hadoop-lzo. Since CDH 4.3 uses hadoop 2.0, most of the forks of hadoop-lzo project fail to compile against new libraries. After some digging I’ve found the original twitter hadoop-lzo branch to be the most maintained and it works perfectly with hadoop 2.0. So, download it, install pre-requisites, build it.
I have built it for us as an RPM, you can check out the spec file here (it depends on some other packages from that repo, but you should get the idea and should be able to modify the script to build on vanilla Redhat linux w/o additional packages). Another option would be to take a look at Cloudera’s GPL Extras repository and their lzo packages and documentation.
After you have built and installed your LZO libraries, you should be able to use them with HBase without any additional configuration. To test HBase support for LZO compression you could use the following command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | $ hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/testfile lzo 13/06/13 04:43:14 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 13/06/13 04:43:14 INFO util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 13/06/13 04:43:14 INFO util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C 13/06/13 04:43:14 DEBUG util.FSUtils: Creating file=file:/tmp/testfile with permission=rwxrwxrwx 13/06/13 04:43:15 ERROR metrics.SchemaMetrics: Inconsistent configuration. Previous configuration for using table name in metrics: true, new configuration: false 13/06/13 04:43:15 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/testfile. Expecting at least 5 path components. 13/06/13 04:43:15 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 13/06/13 04:43:15 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 64cec2e0439bd92a0a6bf3af28f5015a6836fc32] 13/06/13 04:43:15 INFO compress.CodecPool: Got brand-new compressor [.lzo_deflate] 13/06/13 04:43:15 DEBUG hfile.HFileWriterV2: Initialized with CacheConfig:disabled 13/06/13 04:43:15 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/testfile. Expecting at least 5 path components. 13/06/13 04:43:15 INFO compress.CodecPool: Got brand-new decompressor [.lzo_deflate] SUCCESS |
You’re looking for that last line to say SUCCESS
. If it fails, it means you did something wrong and it will tell you what that is.
Now, if you want to use LZO for map-reduce jobs, you need to make a few changes in your /etc/hadoop/conf/core-site.xml
config file. If you manage your configuration yourself, just add the following to your configuration file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | <property> <name>io.compression.codecs</name> <value> org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.DeflateCodec, org.apache.hadoop.io.compress.SnappyCodec, org.apache.hadoop.io.compress.Lz4Codec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec </value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> |
If you’re managing your configuration with Cloudera Manager, you need to do the following:
- Go to your map-reduce service
- Click “Configuration” and select “View and Edit“
- In the list on the left select “Gateway (Default)” and “Compression“
- Add two items to the list of compression codecs:
com.hadoop.compression.lzo.LzoCodec
andcom.hadoop.compression.lzo.LzoCodec
- Open “Service Wide” => “Advanced” in the list on the left
- Add the following configuration to your “MapReduce Service Configuration Safety Valve for mapred-site.xml” section:
1
2
3
4<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property> - Click “Save Changes“
- Restart your map-reduce cluster with updated configuration
Now you should be able to use LZO in your map-reduce, hive and pig jobs.