Summary of Hadoop (1st week)

Working with Hadoop the past week, really getting some good exposure to Linux.

First time for me basically so every step, which should take 10 seconds, takes me 10 minutes or even hours.

Researching commands on Google.

However I'm getting the hang of some of it and can really fly through the directories.

Reminds me of the good old days in DOS in the early 1980's.

So to summarize, I got Hadoop cluster up and running on the Cloudera VM.

Got a sample WordCount class turned into a Jar file and initiated a Map Reduce job in Hadoop file system producing output files.

Then played with Hive and Pig.

Then worked with Pentaho dashboards and reports.

And then Pentaho Spoon which is a graphical tool to connect to a variety of data sources including Hadoop, running jobs, and exporting to output datasource.

I tried to get my first example to run in this environment and ran into several stumbling blocks.

Turns out the out of the box configuration needs to be tweaked and I think I over tweaked it.

However I got it back to it's original state by setting the fs.default back to

In order to get the Pentaho job to run though I think it will need to be modified to perhaps localhost or - time will tell.

It's been quite a learning experience.

Similar to being dropped off from a helicopter in foreign territory and having to find your way back.

It can be done, but it's slow going.  I've run into just about every obstacle so far except I just keep pushing forward in the hopes that everything will work.

And so it goes!