HDFS is a file system that sits on top of the operating system. You store files in the HDFS so Hadoop can recognize them. The Metadata goes into HCatalog, so future programmers know what data resides where.
Why not have a Hadoop like system read a layer below HDFS, the actual file system. Let it read all the contents of your computer, self index, and make that data available through Hadoop.
Then you can link together all computers in your organization, in real time. I don't see why not. Then Hadoop can read in all files of every computer. An internal brain of the organization. People could search for stuff, do analytics, could server as a file backup system, archived forever. And why not add the email exchange server as well. All email that flows through an org is owned by the org, so technically its property of the org. Perhaps lock down specific users folders, for managers and VPs and such for sensitive information.
I don't see why the world of Hadoop should be limited to reading from just the HDFS system. Open up the hard drives of the operating systems as well for greater insight.
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...