So let's think about this for a minute.
You have data in your Relational Database, you have web logs and perhaps sensor data.
So the data exists in one place.
You push all this data to HADOOP , which gets stored in the HADOOP File System (actually in 3 separate places).
So the data exists in two places.
Then you push some of that data to a NoSQL Database, perhaps HBase, which makes another copy and stores it on top of the HDFS system.
So the data exists in three places.
Then you create some HIVE tables, which stores another copy in the HDFS system.
So the data exists in four places.
Then you export that data to perhaps Excel or back to another Relational database for Users.
So the data exists in five places.
HADOOP is known for storing huge data on commodity hardware to save costs.
However, as we just stepped through, it could end up costing you way more because the data is replicated in several different places.
Just something to consider when determining total cost of data ownership.
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...
It seems like open source applications are the mainstream today. So many new products delivered through Aache foundation. Some do this. S...