If you've gone down the path of Big Data and Hadoop the first thing you'll notice is the complexity.
First of all, understanding the distributed architecture across multiple nodes, primary & secondary servers, running Map / Reduce in Java, SQL commands in Hive, data transformations in Pig, you soon realize the multiple layers of understanding required.
Then throw in SQOOP (ingesting and sending data to OLTP systems), Mahout and machine learning and predictive analytics as well as security challenges.
It soon becomes overwhelming.
Luckily the top vendors have made available VM to download to assist the average user in getting up to speed, without having to worry about setting up the environment.
Here's the links to someVendors (not in any order):
Microsoft HDInsight for Azure
IBM Big Data Platform
Personally, I've downloaded both and even attended a two week course on Cloudera training.
I really like the Cloudera Impala implementation of writing SQL like language to bypass the Map Reduce process.
Although Hortonworks has a similar project called Stinger which utilized the TEZ methodology to speed up the Map Reduce process with goal of 100 x faster than traditional HIVE.
That should help get up to speed on Hadoop and both sites have good demo's on Use Case scenarios and why you would want to leverage the awesomeness of Big Data.
Hope you enjoyed this blog and good luck in your Hadooping!
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...