In the world of data, we often hear Data Quality.
And what that mean is the data is not dirty.
So what's clean data?
Data that can produce insights.
Because the data has lots of characteristics about specific objects.
It could be census data, weather data, stock market data, archived emails, just about anything.
And a good data person can take that data set, mine it, and turn it into information.
And what if you have multiple clean data sets which you could mash together.
To really slice and dice and find patterns and identify predictive scenarios.
Imagine how much clean data is accumulated at some of the gov't data centers.
Mashing Facebook, LinkedIn, YouTube, Twitter, Cell phone company's, Taxi and rental car company's, unlimited data sources.
Shove it in Hadoop and begin their search and discover.
And store it forever. Imagine the possibilities.
So wouldn't it be nice to have as much clean data as possible.
To derive insight, mine it over time and predict future behavior.
That's a data scientist dream!
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...
It seems like open source applications are the mainstream today. So many new products delivered through Aache foundation. Some do this. S...