#HADOOP Data Stored in Several Locations

So let's think about this for a minute.

You have data in your Relational Database, you have web logs and perhaps sensor data.

So the data exists in one place.

You push all this data to HADOOP , which gets stored in the HADOOP File System (actually in 3 separate places).

So the data exists in two places.

Then you push some of that data to a NoSQL Database, perhaps HBase, which makes another copy and stores it on top of the HDFS system.

So the data exists in three places.

Then you create some HIVE tables, which stores another copy in the HDFS system.

So the data exists in four places.

Then you export that data to perhaps Excel or back to another Relational database for Users.

So the data exists in five places.

HADOOP is known for storing huge data on commodity hardware to save costs.

However, as we just stepped through, it could end up costing you way more because the data is replicated in several different places.

Just something to consider when determining total cost of data ownership.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Thoughts to Ponder