Perahps Data Science is a Team Sport

What is the decomposition of roles for a Data Scientist?

Well, you have to know how to prepare data.

And program.

And domain knowledge.

And statistics.

Which is the most important?  Well, you have to know them all.

For data, you have to know relational file structures, SQL Language, Big Data and NoSQL perhaps.

For programming, you have to know data cleansing, file parsing, string manipulation, merging data sets, scraping data from the web.

For domain knowledge, you have to know the industry, the norms as well as the caveats, processes, key decision makers, stakeholders.

For statistics, you have to know high level algorithms, how and when to apply each one, how to interpret them, look for outliers, sample sets, standard deviations.

And you must be a storyteller, a way to easily describe your findings in easily digestible formats.  You should know how to document your solutions, the methods in which you prepared the data, what formula's applied and results in concise document for distribution.

And you may be required to automate your findings into a working solution such as exposed web service so others may query your model and receive statistical probability of possible outcomes based on parameters sent.

So who has all these skills?  Data, Programming, Industry experience, MBA process flow, Statistics, Project Management, Documentation and Public Speaker.  Even if you have all these skills, I bet one or more technologies have changed since you began reading this blog.

Perhaps Data Science is a team sport.

No comments:

Post a Comment

We Interrupt this Broadcast