Today is day 2 of Hadoop training.
First we did an exercise using Eclipse in Java.
Load some sample Java files into the IDE, the Driver, Mapper and Reducer.
Then export to a Jar file, then run a Hadoop command to start the job by specifying the Input and Output locations.
Then we learned about Unit testing in JUnit and did an exercise.
One cool feature is the ability to step through the Java code in the Eclipse IDE.
After lunch we discussed Combiners, which sit between the Mapper Phase and the Reducer Phase. Basically, is sums up the data nice and neat prior to sending to the Reducer, saves network traffic when dealing with huge sets of data.
Next were Partitions, basically you can pre-assign which partition to send a Reducer too. For example, if you parse a date by month, you specify the 12 months in a configuration, then create 12 partitions (0-11), and based on the month, you send the Reduce phase to that partition.
We did labs on both these topics. Basically, they provide the solutions, I get them to run, then study the code.
I think the next topic with be Map only jobs (no reducers) and Counters.
We'll see what happens!
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...
It seems like open source applications are the mainstream today. So many new products delivered through Aache foundation. Some do this. S...