Day 2 Hadoop Training

Today is day 2 of Hadoop training.

First we did an exercise using Eclipse in Java.

Load some sample Java files into the IDE, the Driver, Mapper and Reducer.

Then export to a Jar file, then run a Hadoop command to start the job by specifying the Input and Output locations.

Then we learned about Unit testing in JUnit and did an exercise.

One cool feature is the ability to step through the Java code in the Eclipse IDE.

After lunch we discussed Combiners, which sit between the Mapper Phase and the Reducer Phase.  Basically, is sums up the data nice and neat prior to sending to the Reducer, saves network traffic when dealing with huge sets of data.

Next were Partitions, basically you can pre-assign which partition to send a Reducer too.  For example, if you parse a date by month, you specify the 12 months in a configuration, then create 12 partitions (0-11), and based on the month, you send the Reduce phase to that partition.

We did labs on both these topics.  Basically, they provide the solutions, I  get them to run, then study the code. 

I think the next topic with be Map only jobs (no reducers) and Counters.

We'll see what happens!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Thoughts to Ponder