I attended the Orlando Data Science Meetup Group today 6/11/2014.
Great turnout. I was working in Lakeland about an hour away, so the drive to Orlando wasn't too bad, even with the rain and rush hour downtown traffic.
Started with introductions, then dove into an clean install of Hadoop on a Linux box.
Note: diff between Hadoop v.1 and v.2 is they've reduced the number of folders which comprise Hadoop.
For me Linux isn't an everyday environment so I just watched the install on projector.
Meanwhile I already had Hyper-V Hortonworks VPN running a 1.3 Linux install, a 2.0 Linux install and as well as a Windows 2.0 version of Hadoop, and I don't have a Linux operating system to do the install.
After a short time the environment was setup and Hadoop was up and running.
The presenter who did the Hadoop installation, really knew his stuff and managed to do the install from scratch seamlessly. I've poked around in the folders and config files and there's really a lot of things to know so I understand the level of knowledge and details involved.
Understanding Linux is required for the install including commands, editors, RSA keys, passwords, users, folder structures and a ton more.
They recommend using a bunch of Mac minis to set up a small cluster, cool huh?
For larger clusters, they recommend renting time on someone else's cluster.
There was mention of creating a cluster on VMs but overall there's no cheap solutions for creating, running and administrating a large cluster.
In speaking with some attendees, they recommend to download a Linux ISO and install from scratch for Hyper-V.
Here's a link: http://blogs.msdn.com/b/virtual_pc_guy/archive/2010/10/21/installing-ubuntu-server-10-10-on-hyper-v.aspx
This was the first of three sessions, next up is Yarn, then MapReduce.
Looking forward to it~!