5/09/2013

#Hadoop Garbage Truck Analogy

When I think of Hadoop, I think of large sets of data.

Stored in the Hadoop Distributed File System (HDFS).

Where you can query it's contents.

It's similar to a garbage pickup truck, basically the Map phase.

Each truck goes around the neighborhood, picking up a copy of your garbage.

There's a separate truck for each house.

Once they've got your load, they return to a meeting point, where all the garbage is collected.

By that, it's get's Shuffled, Sorted and Merged.

Then they send that batch of merged sorted garbage, to another randomly chosen hub, where that garbage is then resorted with all the other garbage trucks, or the Reducer Phase.

The Reducer sorts all the garbage again, and produces it's own output, in the form a file(s)

Kind of a stretch analogy, but it's close.

And there you have it.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Thoughts to Ponder