Adding Custom Hive SerDe and UDF Libraries to Cloudera Hadoop 4.3

Posted in: Databases, Development
Tags: cloudera, custom, hadoop, hive, jar, java, json, serde, udf

26 Jul2013

Yet another small note about Cloudera Hadoop Distribution 4.3.

This time I needed to deploy some custom JAR files to our Hive cluster so that we wouldn’t need to do “ADD JAR” commands in every Hive job (especially useful when using HiveServer API).

Here is the process of adding a custom SerDE or a UDF jar to your Cloudera Hadoop cluster:

First, we have built our JSON SerDe and got a json-serde-1.1.6.jar file.
To make this file available to Hive CLI tools, we need to copy it to /usr/lib/hive/lib on every server in the cluster (I have prepared an rpm package to do just that).
To make sure Hive map-reduce jobs would be able to read/write JSON tables, we needed to copy our JAR file to /usr/lib/hadoop/lib directory on all task tracker servers in the cluster (the same rpm does that).
And last, really important step: To make sure your TaskTracker servers know about the new jar, you need to restart your tasktracker services (we use Cloudera Manager, so that was just a few mouse clicks ;-))

And this is it for today.

Homo-Adminus Blog

Yet Another Admin’s Blog