Today I entered the world of Big Data.
By installing Cloudera VM to my work PC.
And posted a blog about that: http://www.bloomconsultingbi.com/2013/01/getting-started-with-cloudera-hadoop.html
This is research and development for my job as I'll be working with Hadoop and possibly Splunk in my new role.
In order to mash weblog data in the form of Big Data with our current Enterprise Business Intelligence data.
So I started on the initial project to get acclimated with the environment.
They provide the sample code, however, that doesn't necessarily mean 'easy'.
First thing I did was to create a text file, add some Java code into a class and save the file.
Next step was to compile the java class into a Jar file and associated java files.
Easier said than done. Because getting familiar with the Linux file structure takes some time. As well as the commands and classpaths and what have you.
I eventually compiled everything which led to my next road block.
It seems the Hadoop system was not reading my new JAR file, probably because the path location was not indicated in the Hadoop Classpath, which it wasn't.
I found out we have a resource in house that is familiar with this and I pinged her.
She said to create an environment variable and append my new path to the current Hadoop Classpath.
Key thing is to use `back ticks' in the form of:
export MY_CLASSPATH=`hadoop classpath`:/home/cloudera/wordcount_classes/\*
or something to that effect which you can then reference as $MY_CLASSPATH
So that worked and I'm on to my next roadblock.
Which is my understanding of how the sample is supposed to work.
However that is enough for today.
The cool thing is this project is sponsored internally so I'm authorized to learn this stuff.
I'd have to say my 4 years of Java coding is paying off a bit, however, it would probably go smoother if I had more experience on Linux.
I'm real excited to get into this space I've been reading and blogging about it for a long time.
And there you have it!