When you look at raw data, it looks like a jumbled mess.
In addition, it looks so stale.
Just a bunch of text gobbled together.
I was trying to think of what it lacks.
Substance. As in descriptive adjectives.
For example, when you look at a flower what stands out?
The color, the shadows, the texture, the aroma, the touch, etc.
All attributes that can be synthesized by the 5 senses.
They are open for interpretation.
So a flower expert can look at a particular flower and go into great details about it.
Species, blooming schedules, related plant species, and a slew of adjectives to describe it.
Now take data. To the average person, the data looks like a jumbled mess.
Because they don't know what they are looking at because there's no descriptive features to look for.
An expert would know what context to interpret the data, how it relates to other data, what patterns to look for, the anomalies, etc.
He or she knows the descriptive adjectives that the casual bystander does not.
So I say what we are lacking is descriptive data. Similar to an XML document. The data is self described by the other data contained within the doc.
So you could pass this descriptive data around to any user anywhere anytime and they could describe the data same as everyone else.
Where the source derived from. When the data was captured. By whom. What does it infer. Etc. Etc. Etc.
That's the biggest problem I see when we talk about data, lack of description.
What are your thoughts - am I off the mark here or is this a giant hole in the data infrastructure?
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...
It seems like open source applications are the mainstream today. So many new products delivered through Aache foundation. Some do this. S...