The truth of the matter is that data is running wild and free in the bowels of organizations. The goal of a Data Professional is to tame this wild beast. And domesticate it.
For this to happen, one must get into the mind of data. Figure out how it flows, what hidden message reside within, and how to extract such knowledge.
In doing so, one must become a Data Whisperer.
Any organization thinking to enter the data analysis space, which should be every org, must determine the cost of doing business. As in Return on Investment. Or ROI.
How is this accomplished. There are "Fixed Costs". As in Servers, Network, Software, Developers, maintenance agreements, etc. Or host in the Cloud to leverage elasticity, backups, remote workforce, etc. Some of the software is actually free, as in Open Source languages and Big Data storage.
Besides the fixed costs, you have many unknowns. Like time. How long should the project last. One way to get a handle on time, is to break out the project into smaller chunks, over time, Agile approach, rather than Waterfall.
What is the final product of a Data Initiative. Convert raw material, Data, apply business rules, mash the data with other data sets, to produce refined diamonds of Insights. What's the value of these Diamonds? We don't know up front. Let's say this piece of information, when applied, increase gross sales by 10%. 10% of a particular product could be billions. How about saving costs, by implementing this change, we can reduce money spent by 20%. Or change this process, save 30%.
As you can see, the final result of a Data Project can fluctuate greatly. You can also see the potential can be great. So your return on investment can be calculated up front to some degree, as in fixed costs of the project, although, that can be offset exponentially based on the findings and resulting changes. Impact could be staggering.
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...