Although I programmed a computer in the early 1980s on the IBM PC, I didn't major in computers in college. After graduation, I gravitated towards computers via Reporting. Specifically Crystal Reports.
Back in 1995, there were no resources available to learning Crystal Reports. No online books, no User Groups, no Forums. How did I learn? We had a consultant in town from one of the bank acquisitions and we got a conference room for an entire day. We went over every single feature in Crystal Reports developer 5.0 and I picked his brain. That was my formal training.
Data Science wasn't a thing then, the internet was just ramping up. Machine Learning as an occupation didn't really exist. Although I knew about statistical models. As I was a bank underwriter and we moved over to a new software package that used Models to predict outcomes based off age, gender, years on job, years at residence, plus credit reports. It based the model on history of data and the software costs a bundle.
Fast forward to today. The amount of available online tutorials is staggering. If you want to learn a new technology, go find a course. Right now I'm attending the Statistical Learning course from Stanford University, no cost, self paced, free Statistics e-book plus examples. Although the math is at a high level, you can learn a lot from the videos. The nice part about this particular course, they utilize the R language (free + industry standard with 1000s of built in functions) rather than MatLab.
During college, I was a business major sort of for the first 2 years. And I attended Statistics class. And I enjoyed that class a lot. And tucked it away for 30 years. Turns out Statistics is the bread and butter of Data Science. And by that, you have to know what algorithms sets are available, what each do, how to apply, tune and analyze them. Today's software does much of the heavy lifting so you don't necessarily have to know the underlying math.
Statistics Learning to much degree is a black box. Input. Functions. Output. Regression. Classification. Clustering. Supervised learning, where you have labeled data and derive a target outcome. And Unsupervised Learning, data is not labeled, just try to organize into buckets.
Of course, you have to have clean non duplicated data, a valid question to answer, understanding of available statistical methods, how to apply and adjust as necessary, to formulate a result that is easy to interpret and fairly accurate. Result shouldn't be too accurate or "over fit" and not inaccurate or 'under fit". You want some degree of error in order to apply future data sets and get accurate results.
When I worked with Crystal Reports 1995, not many people were doing reporting for a living. Basically a one eyed man in the land of the blind. The market has grown, on steroids, and it's flooded with resources, vendors, tools, new markets emerged. And it's all due to the fact that software is now in the hands of everyday people, improvements to hardware and processing power, availability of data sets and online courses to train the next generation.
In this field, we never stop learning.
And so it goes~!
Thanks for reading!
I signed up for the Hortonworks Certified Associate exam last Thursday. Figured if I sign up, I'd have to take the test. And if I tak...
This blog post is in no way an attempt to steal other people's work. It's basically an conglomeration of notes from research I did...
Saw a post today on Twitter, " Microsoft releases CNTK, its open source deep learning toolkit, on GitHub " This is big news. Be...