Tag: java
Adding Custom Hive SerDe and UDF Libraries to Cloudera Hadoop 4.3
26 Jul2013

Yet another small note about Cloudera Hadoop Distribution 4.3.

This time I needed to deploy some custom JAR files to our Hive cluster so that we wouldn’t need to do “ADD JAR” commands in every Hive job (especially useful when using HiveServer API).

Here is the process of adding a custom SerDE or a UDF jar to your Cloudera Hadoop cluster:

  • First, we have built our JSON SerDe and got a json-serde-1.1.6.jar file.
  • To make this file available to Hive CLI tools, we need to copy it to /usr/lib/hive/lib on every server in the cluster (I have prepared an rpm package to do just that).
  • To make sure Hive map-reduce jobs would be able to read/write JSON tables, we needed to copy our JAR file to /usr/lib/hadoop/lib directory on all task tracker servers in the cluster (the same rpm does that).
  • And last, really important step: To make sure your TaskTracker servers know about the new jar, you need to restart your tasktracker services (we use Cloudera Manager, so that was just a few mouse clicks ;-))

And this is it for today.


ActiveMQ Tips: Flow Control and Stalled Producers Problem
23 Jan2009

It’s been a few months since we‘ve started actively using ActiveMQ queue server in our project. For some time we had pretty weird problems with it and even started thinking about switching to something else or even writing our own queue server which would comply with our requirements. The most annoying problem was the following: some time after activemq restart everything worked really well and then activemq started lagging, queue started growing and all producer processes were stalling on push() operations. We rewrote our producers from Ruby to JRuby, then to Java and still – after some time everything was in a bad shape until we restarted the queue server.

Read the rest of this entry