Scenario: Try to guess what sport I'm talking about.
"He touched the ball 3 times"
Hmm. Could be football, soccer, baseball, basketball, rugby, bowling.
It couldn't be archery, bad mitten, darts, wrestling.
We have enough information to deduce certain items but not enough to clarify entirely.
"He effortlessly passed the ball each time."
We can further reduce the list of possible sports by eliminating: Bowling, Baseball and Bowling.
Because we have more information to illuminate the underlying answer.
"His last touch, he scored with a header on goal."
Finally, we have enough information to know what sport: Soccer.
We had to interpret the data based on our current knowledge and understanding of the domain, to accurately assess the correct answer.
We turned information into knowledge, by asking questions, iterating over time, to enhance our interpretation, based on our current knowledge and experience.
We were never told specifically what sport was being played. We used our intuition.
When company's initiate a project to build a data warehouse, they are building an infrastructure to align the data, to allow for interpretation, to find answers to questions, so they may act upon them, to streamline process, reduce costs or increase sales.
Getting to "action" is not built into the solution. That takes human input.
If you don't have someone to interpret the data into insight, it's just a bunch of information.
You may know which sport is being played, you may reduce certain sports based on the data presented, but to find the correct answer, you will need to do the interpretation yourself.
Enter Data Scientist, Stage Left:
Data Scientists can provide that valuable insight. His or her role is to align the data, create the reports AND do the analysis.
Going beyond the "Traditional Data Warehouse" role, they run algorithms to analyze the data, apply their domain knowledge to the data to recognize trends and make recommendations.
Likewise, a DW developer only has to understand the business flow well enough to model the data warehouse, apply business rules and possibly create Reports and Dashboards. The interpretation is up to the Data Warehouse owners.
Data Warehouse vs. Data Scientist:
Data Warehouse developers don't necessarily need to know Statistics, algorithms and mathematics, where a Data Scientist does. When you hire a Data Scientist, you are essentially getting Analysis and Interpretation and Recommendations based on the current data sets and perhaps mashed with external data.
The Data Warehouse is kind of like doing your own taxes, you better have internal staff that knows the rules to look for deductions, follow the ever changing rules to be compliant.
Hiring a Data Scientist is like hiring a CPA, they know the intricate rules, they can assist in determining the best strategies by analyzing your financial picture for the prior year. And make recommendations for future financial arrangements.
Nuggets of Gold:
Why do we go through the motions of building Data Warehouses and Analyzing data? To find Insights. Insights are the nuggets of gold, laying deep within your databases, waiting patiently to be mined, either by internal staff or a data scientist. When mining for insights, you may find pieces of gold here or there, but you may also find veins of rich information, wouldn't that be nice.
So get digging~!
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...