8/31/2014

So What Is Predictive Modeling

If a user requests a report, we can get that in a relatively short time.
If they want to join desperate data sources into a Data Warehouse, we can do that too.
At some point, the user has to analyze the data.  In doing so, they bring in bias, perhaps skewed perspective and misreading of the data.
However, that's been the world of data for the past few decades for the average Data Professional.

We did have predictive models years ago.  I know from working at the bank approving loans.  My decisions were based on a score.  Where did that score come from?  It read in variables from the loan application in addition to the credit report.  It looked at age, number of years at residence / employment as well as number of revolving loans, installment loans, bad pay history and inquiries.  All those factors and more were sent to the model which returned a score.  The score had thresholds, if the customer exceeded the threshold, he/she was approved.  Otherwise decline.

However, that predictive model based the statistical chances of that customer repaying the loan in full.  There were no guarantees.  My boss always said, there's the chance the customer gets hit by a bus and the loan goes default.  The score is a probability of return payment.

So if we had predictive scoring models in 1994, what's changed.

A lot.  First off, the increase in data has blossomed.  Second, the tools to create models have entered the hands of the average person.  So no PhD required.  Third, with a little knowledge in Statistics, Data Professionals can now create end to end full life cycle applications in a short amount of time.

Based on the new offerings from Microsoft Azure, I could log on, upload a data set, create an application by simply dragging and dropping components onto the canvas, hook them up in such a way as to cleanse and transform the data, build a model, train the model by running a subset of the data through to create weighted averages, then run a real data set through, have it back propagate the data set to increase the accuracy, score the model, output the results for analysis.

Not only that, it will seamlessly create a web REST API, allowing programmers to send data to the EndPoint, have the model application score the data, and return a percentage of probability to the calling app.   What that means is the model is exposed for actual use by a number of calling applications.

What's an example of end to end model usage.  If I'm running a business, a person wishes to purchase a product, as a company we extend credit to customers.  I'd like to know if this applicant is worthy of credit.  So as the person applies online, I as a programmer, send the necessary information to the REST API, through code.  I send the pertinent info, the REST web service receives that info, sends it through the model, returns a score to the calling app and says this potential customer, based on their specific characteristics will be a good paying customer, so yes, extend a line of credit.  Or the opposite, do not extend credit because his / her profile matches a subset of people who did not repay.

So I could create a model to predict if a customer is credit worthy.  Without having to hire PhD Mathematicians or Statisticians.  In fact, I could to it all within a web browser in a relatively short amount of time.

And this is just the tip of the iceberg.  With the growing data sets available, the possibilities are endless.  But what makes it so unique is Microsoft Azure Machine Learning offers the entire package online, with single user login, collaboration, end to end solution with easy to use software and good documentation.

This is where the world of Data is heading.  I think its incredible.