When you hear of someone being a data scientist, you imagine the life similar to a major league baseball player. Status, prestige, salary, pick of the jobs.
I guess what I'm wondering is once they get the job, with high salary and perks, what do they do all day?
Perhaps they report to some senior executive, work in a lab, have unlimited compute power at their disposal, with unlimited access and security to every piece of data in the org.
Or perhaps they report to an IT manager, have to sit in a cube, have to submit ticket request for access to data, perhaps wait days or weeks for permission, perhaps they aren't given enough computer power to do their jobs, and maybe they're locked out of key pieces of data.
I guess it could vary depending on the org.
You can take the funniest comedian on the planet, walk up to them and say "be funny" and they may have difficulty being funny on demand. Perhaps the same is true with data scientist, "where's my insights", you've been working on this project for a while, we'd expect our sales to have increased or reduced costs by now, are you sure you know what you're doing?
And what about project life cycle management. Are data science projects scoped out like other projects, with agile sprints and continuous delivery? Are there milestones and deliverable and phases to projects? How does a data scientist log their hours, to specific projects? How are project hours determined ahead of time? And from who's bucket do the hours get charged to, Marketing, Sales, Finance, Overhead?
What if there mouse stops working, do they enter a ticket into the Ticketing System? Who creates the work environments, development, test and production? Where is the source code saved to? Who performs the database backups? How does code get moved to production, especially if they work in an ITIL environment?
What if the data scientist gets embedded within a particular department? Disconnected from the team of data scientist, do they still have periodic meetings with the team of data scientist to compare notes and leverage the knowledge pool of coding techniques and domain knowledge? Is there a company Wiki for documenting the knowledge learned, or does the domain knowledge walk out the door when the data scientist leaves?
What if the data scientist is unable to discover a true insight? How does a data scientist know which project to go after, easiest, move value, most exposure? Who maintains the queue of experiments to work on, and who prioritizes the list? Who does the quality assurance on the work, before moving to production? Does the data scientist work with a business analyst or domain knowledge expert?
How many meetings per week does a data scientist attend, what are the meetings about, who attends the meetings?
What if the results of a data science project are completely wrong? And management implements the recommended changes with bad downstream effects? Can the data scientist be held liable? Should Data Scientist purchase professional liability insurance in case they get sued?
These are just a few questions I was thinking about regarding the sexiest job of the 21st century. I still think the Data Scientist is the rock star of the workforce, because of the potential impact they have, the intelligence required to be successful and because they have tentacles into every department and every software application and potential external data sets. The Data Scientist is the new "goto" person of any organization. Data Scientist try to tame the wild beast of data explosion, rapid advances in technology and storage capacity and programming and algorythms and domain knowledge. Beyond the hype, they still need to interact in a business with real people and real processes and keep pace with rapid change as well as deliver results to an eager audience.
Quite a challenge. I think it's worth it.
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...