- Posted in: Admin-tips, Databases, Development
Yet another small note about Cloudera Hadoop Distribution 4.3.
This time I needed to deploy some custom JAR files to our Hive cluster so that we wouldn’t need to do “ADD JAR
” commands in every Hive job (especially useful when using HiveServer API).
Here is the process of adding a custom SerDE or a UDF jar to your Cloudera Hadoop cluster:
- First, we have built our JSON SerDe and got a
json-serde-1.1.6.jar
file. - To make this file available to Hive CLI tools, we need to copy it to
/usr/lib/hive/lib
on every server in the cluster (I have prepared an rpm package to do just that). - To make sure Hive map-reduce jobs would be able to read/write JSON tables, we needed to copy our JAR file to
/usr/lib/hadoop/lib
directory on all task tracker servers in the cluster (the same rpm does that). - And last, really important step: To make sure your TaskTracker servers know about the new jar, you need to restart your tasktracker services (we use Cloudera Manager, so that was just a few mouse clicks ;-))
And this is it for today.