What is the decomposition of roles for a Data Scientist?
Well, you have to know how to prepare data.
And domain knowledge.
Which is the most important? Well, you have to know them all.
For data, you have to know relational file structures, SQL Language, Big Data and NoSQL perhaps.
For programming, you have to know data cleansing, file parsing, string manipulation, merging data sets, scraping data from the web.
For domain knowledge, you have to know the industry, the norms as well as the caveats, processes, key decision makers, stakeholders.
For statistics, you have to know high level algorithms, how and when to apply each one, how to interpret them, look for outliers, sample sets, standard deviations.
And you must be a storyteller, a way to easily describe your findings in easily digestible formats. You should know how to document your solutions, the methods in which you prepared the data, what formula's applied and results in concise document for distribution.
And you may be required to automate your findings into a working solution such as exposed web service so others may query your model and receive statistical probability of possible outcomes based on parameters sent.
So who has all these skills? Data, Programming, Industry experience, MBA process flow, Statistics, Project Management, Documentation and Public Speaker. Even if you have all these skills, I bet one or more technologies have changed since you began reading this blog.
Perhaps Data Science is a team sport.
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...