Yet another small note about Cloudera Hadoop Distribution 4.3.
This time I needed to deploy some custom JAR files to our Hive cluster so that we wouldn’t need to do “
ADD JAR” commands in every Hive job (especially useful when using HiveServer API).
Here is the process of adding a custom SerDE or a UDF jar to your Cloudera Hadoop cluster:
- First, we have built our JSON SerDe and got a
- To make this file available to Hive CLI tools, we need to copy it to
/usr/lib/hive/libon every server in the cluster (I have prepared an rpm package to do just that).
- To make sure Hive map-reduce jobs would be able to read/write JSON tables, we needed to copy our JAR file to
/usr/lib/hadoop/libdirectory on all task tracker servers in the cluster (the same rpm does that).
- And last, really important step: To make sure your TaskTracker servers know about the new jar, you need to restart your tasktracker services (we use Cloudera Manager, so that was just a few mouse clicks )
And this is it for today.