They say raw data is not meaningful until it's interpreted for value.
Have you ever opened an email, with attachments. You see an icon, maybe PDF or Work or Excel. Other than the file name, you have no idea what the document contains.
What if all documents had an embedded file or hidden data compartment, to store off pertinent information. Born on date, by whom, when, computer name & IP address, location, etc.
An encyclopedia of who did what when where.
The info could be stored perhaps in an XML file. All applications would have write access to add to the file, who touched it, how long, when, where, a continuous audit trail, readable by all applications that could read the XML history file.
Okay, what else? How about applying the same logic to data. Where was this piece of information 'born'? When, by whom, by what application? Descriptive information, embedded within the data, readable and writable only for audit trail purposes, can't modify items from the past.
Okay, what else? How about each set of data contains descriptive data, what's contained in each table, database, field, etc. When I receive the file, my app knows how to read the hidden file, view in on screen, it automatically lets you know what data it could potentially map to your current data sets. This file contains data on products for the South East region of the US first quarter 2016.
There wouldn't need to be a specialized IT person to import the data, as each application know's the table structure, the fields, row count, etc. so every app can just 'suck' in the data without any thought or human intervention.
They say ETL is the toughest part of working with data, well, this would help solve that issue, as well as automate things a bit, speed up process time, or 'time to insight'.
Data is completely stupid because it's non descriptive. Well, I say make it descriptive. By having apps follow an industry standard to keep track of the data over time through internal hidden files that keep audit trail.
Guess what, if the data got hacked and published on the web, guess what, the hidden file knows who opened the file, where, when, computer name, IP address, etc.
I don't know, I've thought about this before, applying namespaces to all data. Not just server, database, table, field. But add in a bunch of other identifying factors. Also keep an audit trail. And there you have it.
And so it goes~!
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...