My "Sun Planets" Bundle

My "Sun CAPS and OpenESB Blogs" Bundle

My "MyBlogs" Bundle

My Blog List

Wednesday, November 6, 2013

Stream Processing and BigData - Setting up Storm for Stream Processing of BigData

Pre-requisites - Before you start

At a minimum you would need:

  1. You need to setup the Virtualized Sandbox for your BigData exploits as detailed in my blog post here webcornucopia.blogspot.com/2013/11/a-virtualized-sandbox-for-my-bigdata.html
  2. All through this write-up we're going to assume that a user with id hduser has been setup on your Ubuntu machine. If you're using a different user name, please make the appropriate modifications.

Downloads Required

1. Download Storm

Download the latest version of Storm from here: 
http://storm-project.net/downloads.html
As of this writing, the stable version of Storm is 0.8.2.

2. Install the following packages

You will need to install the following first make, pkg-config, libtool, automake, g++, uuid-dev, maven, and git. Here's how to do it.
sudo apt-get install make
sudo apt-get install pkg-config
sudo apt-get install libtool
sudo apt-get install automake
sudo apt-get install g++
sudo apt-get install uuid-dev
sudo apt-get install maven
sudo apt-get install git

3. Install the latest version of leinengin. 

Leinengen is used for automating builds of certain Storm samples, some of which are written in Clojure. Leinengen is still the best choice for automating clojure projects. As mentioned here,  https://github.com/technomancy/leiningen/wiki/Packaging, many package managers still include version 1.x, which is rather outdated, so you may be better off installing manually as explained below. If you do apt-get you will end up getting an older outdated version. The way to get the latest version is to do the following:
3.1. Copy the contents of this shell script into a file called lein
https://raw.github.com/technomancy/leiningen/stable/bin/lein
3.2. Move the lein file to /usr/local/bin
3.3. Give all permissions to the lein file:
sudo chmod 777 /usr/local/bin/lein
3.4. Upgrade to the latest version of leinengen which is now at 2.3.3 as of this writing:
/usr/local/bin/lein upgrade

Building Storm

1. Expand the storm-0.8.2.zip into a folder of your choice:
unzip storm-0.8.2.zip 
2. From the folder you exploded the Storm files, you need to Install ZeroMQ by executing:
bin/install_zmq.sh

Note: Issues you could potentially encounter and how to get around them while executing ZeroMQ install

Error Type #1
If you encounter this error while installing ZeroMQ:
Problem with the SSL CA cert (path? access rights?) while accessing...
You might want to ignore SSL verification as follows:
git config --global http.sslVerify false
And then re-run:
bin/install_zmq.sh
Error Type #2 - Happens mostly on Ubuntu and Mac
If you encounter this error while installing ZeroMQ:
make[1]: *** No rule to make target `classdist_noinst.stamp', needed by
`org/zeromq/ZMQ.class'.  Stop.
This is a known bug discovered by ebroder. Please See https://github.com/zeromq/jzmq/issues/114.
2.1. To fix it, edit the Makefile.am file at jzmq/src:
vi jzmq/src/Makefile.am 
2.2. Replace classdist_noinst.stamp by classnoinst.stamp
2.3. Re-run ZeroMQ install:
bin/install_zmq.sh

Setting up Storm Environment Variables

Once the ZeroMQ install is done, you will want to modify your startup script like .bashrc or any other startup scripts that you may have to set up Storm environment variables so you dont have to type them in again and again.
vi $HOME/.bashrc
At the end of the file, add the following lines making sure you specify the correct path. I have my storm files in $HOME/runtimes/storm-0.8.2 and therefore, my startup script has the following:
export STORM_PATH=$HOME/runtimes/storm-0.8.2
export PATH=$PATH:$STORM_PATH/bin

Getting and Building Storm Samples

The Storm Starter kit at https://github.com/nathanmarz/storm-starter contains some examples to play with. It can be obtained through Git as follows:
git clone http://github.com/nathanmarz/storm-starter

Building the Storm Starter Samples

Build the samples using:
/usr/local/bin/lein deps
/usr/local/bin/lein compile
/usr/local/bin/lein jar
The output on the screen at the end of executing the previous command is:
Compiling storm.starter.clj.word-count
hduser@ubuntu:~/runtimes/storm-starter$ /usr/local/bin/lein jar
Retrieving org/clojure/clojure/1.5.1/clojure-1.5.1.pom from central
Retrieving org/clojure/clojure/1.5.1/clojure-1.5.1.jar from central
Compiling 2 source files to /home/hduser/runtimes/storm-starter/target/classes
Created /home/hduser/runtimes/storm-starter/target/storm-starter-0.0.1-SNAPSHOT.jar
hduser@ubuntu:~/runtimes/storm-starter$

Run Storm Samples

You can refer to this URL for reference:
https://github.com/nathanmarz/storm-starter

1. Executing the ExclamationTopology Sample written in Java:

java -cp $STORM_PATH/lib/*:$STORM_PATH/storm-0.8.2.jar:\
$HOME/runtimes/storm-starter/target/storm-starter-0.0.1-SNAPSHOT.jar \
storm.starter.ExclamationTopology

2. Executing the Word Count Sample written in Clojure:

/usr/local/bin/lein run -m storm.starter.clj.word-count

Conclusion

We have successfully installed Storm and run the samples. We are now ready to perform further explorations in Stream Processing of BigData.

No comments:

Post a Comment