When I think of Hadoop, I think of large sets of data.
Stored in the Hadoop Distributed File System (HDFS).
Where you can query it's contents.
It's similar to a garbage pickup truck, basically the Map phase.
Each truck goes around the neighborhood, picking up a copy of your garbage.
There's a separate truck for each house.
Once they've got your load, they return to a meeting point, where all the garbage is collected.
By that, it's get's Shuffled, Sorted and Merged.
Then they send that batch of merged sorted garbage, to another randomly chosen hub, where that garbage is then resorted with all the other garbage trucks, or the Reducer Phase.
The Reducer sorts all the garbage again, and produces it's own output, in the form a file(s)
Kind of a stretch analogy, but it's close.
And there you have it.
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...
It seems like open source applications are the mainstream today. So many new products delivered through Aache foundation. Some do this. S...