There's all sorts of data.
You have customer information, perhaps gathered from a website, such as E-Bay.
In this case, web developers wrote a web app, containing business logic, which allows users to perform an action, which gets saved most likely to a Relational Database.
That info could be ETL to a Data Warehouse.
Either could have reports written against it for consumption by employees at E-Bay.
Another type of website could be Twitter. Not much functionality, but more of a data gathering / input site. Data gets stored in mega-databases and can be interrogated by a report developer.
Another type of data would be a call center Phone Switch which registers the flow of incoming calls, how they got routed, duration of each leg of the call, dropped percentages, etc. Very little user info here as everything is automated (users don't even know this info is recorded).
Another type of data could be a non B2B site, such as a gov't site where users poke around the site looking for info, etc. Their clicks could be stored to a database to track the flow of user for example. This data could be mined to observer user behavior and to identify trends.
However, with all this data, how does one extract meaning information to provide value to decision makers.
Well, let's track the flow of structured data:
1. Internal Customer has need for website or info gathering.
2. The Business Analyst gathers the specs.
3. The Web developers / Vendors create a website.
4. The Database Admin or Developer creates a database to store info.
5. Someone requests reports.
6. Business Analyst gathers Report Specs.
7. Report Developer writes Reports.
8. QA verifies the Reports.
9. Change Management Pushes the report to production.
10. Someone reviews data looking for patterns.
11. Decision Maker person presents finding to Top Execs.
12. Someone makes a business decision.
So, we've tracked the data flow from design to decision.
Now, where is the Data Scientist in this scenario? Is he/she writing the reports #7? Or #10, looks for patterns.
I don't know you tell me.
In this case, I think the role of the Data Scientist is the accumulation of all the pieces of the puzzle.
Which may or may not be a single person.
How about Semi-Structured Data:
Almost every organization has emails. And most of those emails are archived. And someone could mine those emails for patterns, trends, aggregate data, etc. This may or not require a Data Scientist.
Now how about Un-Structured Data:
Let's say your organization has a network. And on that network you have files. And those files are PDF, Word, Excel, Text, etc. Those files can be consumed by Big Data Hadoop Open Source Software for example, in named pair values. And that data can be parsed, usually not in real time, and a person could extract nuggets of info from all that un-structured data.
So to summarize, I don't necessarily see a definite need for a Data Scientist in every organization. Sometimes, for predictive analytic or data mining, where statistical calculations are required, well then yes, a Data Scientist may be required.
That's what I think (at the moment).
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...
It seems like open source applications are the mainstream today. So many new products delivered through Aache foundation. Some do this. S...