In Hadoop, we have tables in Hive. We can create tables, import data, query from them, and drop them.
Another nice feature about Hadoop Hive tables is the ability to save your data in Binary tables. These are known as ORC tables. They are fairly easy to create:
DROP TABLE IF EXISTS NewTableORC;
CREATE EXTERNAL TABLE NewTableORC (
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS ORC
Once created, you simply import the data using a SQL Select statement:
INSERT OVERWRITE TABLE NewTableORC SELECT * FROM OriginalTable;
Then drop the originating table if you wish.
DROP TABLE OriginalTable;
That way, if someone happened to be nosy and look through your data, if they had access to it, they would only see gibberish:
And the other benefit, it's reduced the file size from built in compression, here you can see the Orc table vs the Original table file size, from 477,668 to 62,554 containing about 9k rows:
There’s a built in compressions “ZLIB” table property,
STORED AS ORC TBLPROPERTIES ("orc.compress"="ZLIB");
it could reduce the file size further.
And that's a basic intro to Orc tables in Hadoop
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Data becomes information. Information adds value if used properly to align business practices, streamline processes with net result of incr...
Data is the new oil. Sort of a good analogy. Except new oil is constantly required. And there is only so many oil wells on the planet. A...
What do you want to do when you grow up. For some of us, we still haven't decided. After close to 50 years. Chances are, if you chos...