Having data means nothing. Unless you can interpret it. That's the latest phrase heard recently.
And its probably true.
Data tells a story. Data is merely accumulated pieces of raw data. Perhaps in relational tables or flat files or Excel docs or unstructured.
Even still, something's missing.
Data needs to be more descriptive. As in attributes that describe the data. In language we have nouns: persons, places and things. And we have verbs: things doing something in different tenses, past, present and future.
We also have adjectives and adverbs, which describe the nouns and verbs. Book. What kind of book subject? Size? Shape? Contains what? Author? Date written? Might be valuable information. Describes the book.
Run. Run where? By whom? When? How far? Started and ended where? Describes the "run".
It seems we need an inherent way of self describing data. If someone hands you some data, wouldn't it be nice if you ran it through an interpreter, or compiler, or some data framework to load up the data in descriptive detail, without having to write queries, joins, merge data sets, etc.
Thanks for loading this data. This data was created at this location, at this time, by this application. The data describes a set of purchases of books. Here's information about the books. The author. Who purchased them. How much they paid. What else they bought. The customer demographics. And on and on.
Plug and play. Insert data set. Whola! Here's the story about the data. And of course we could interrogate it further. With queries or natural query language or put into charts and graphs and visualizations and compare against similar prior data sets.
Presto magic. Self describing data. Pieces of data with self describing attributes. That can be loaded into a pre-built framework by anyone anywhere, so long as they have permissions.
And why not share some of these self describing data sets. Put them out on the web to be consumed by REST Web Services or SOAP calls or query the data remotely.
Data without interpretation is like stacked bundles of hay. Doesn't do much. It's when you understand the data that it becomes valuable. Have the data tell you it's story. By labeling your data with self describing attributes to be self interpreted by you or machines.
And then have machines crawl those data sets like a network of self describing knowledge. A world digital encyclopedia 24/7 from anywhere anytime, just expose the data sets for public consumption, keep the private data private.
That's the piece that's missing from the data centric world in my opinion.
Self Describing Data that tells you it's story.
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...