Get Up to Speed Fast in #Hadoop

If you've gone down the path of Big Data and Hadoop the first thing you'll notice is the complexity.

First of all, understanding the distributed architecture across multiple nodes, primary & secondary servers, running Map / Reduce in Java, SQL commands in Hive, data transformations in Pig, you soon realize the multiple layers of understanding required.

Then throw in SQOOP (ingesting and sending data to OLTP systems), Mahout and machine learning and predictive analytics as well as security challenges.

It soon becomes overwhelming.

Luckily the top vendors have made available VM to download to assist the average user in getting up to speed, without having to worry about setting up the environment.

Here's the links to someVendors (not in any order):

Hortonworks Sandbox
Cloudera CDH
Microsoft HDInsight for Azure
IBM Big Data Platform

Personally, I've downloaded both and even attended a two week course on Cloudera training.

I really like the Cloudera Impala implementation of writing SQL like language to bypass the Map Reduce process.

Although Hortonworks has a similar project called Stinger which utilized the TEZ methodology to speed up the Map Reduce process with goal of 100 x faster than traditional HIVE.

That should help get up to speed on Hadoop and both sites have good demo's on Use Case scenarios and why you would want to leverage the awesomeness of Big Data.

Hope you enjoyed this blog and good luck in your Hadooping!