2/15/2017

Intro to Statistical Learning Notes from Online Course

When thinking about machine learning there's a lot going on.
 
Inference attempts to understand the relationship between the predictors and the results.  If we send in a value of 10 for parameter 1, the result is something.  If we send in a value of 20, the results is something else.
 
Prediction attempts to fit the model such that the relationship between the predictors and the results can identify future accurate results.
 
Both of these are included in Supervised Learning which typically have a Predictor and a ResponseLinear Regression and Logistic Regression are classic version.  Newer techniques include GAM, Bootstrapping and Support Vector Machines.
 
The alternate approach is known as Unsupervised Learning.  UL also has Predictors but no Response.  Basically it attempts to organize the data into buckets to understand relationships and patterns known as Clustering or Cluster Analysis
 
There is actually another approach known as Semi-Structured Learning which combines the two.
 
Another point of interest is the differentiation between Flexibility and Interpretability
 
Some methods are restrictive and inflexible, yet they are easy to interpret such as Least Squares or Lasso.  These are typically associate with Inference and Linear Models.
 
Opposite methods are flexible like thin plate splines yet more difficult to interpret.  Flexible models are associate with Splines and Boosting methods and seeing the relationship between predictor and results is rather difficult.
 
Parametric Methods have a two step approach: 1. assume the relationship of data points is linear 2. apply a procedure to fit or train the model using training data.  One possible affect is overfitting the model when the results are too accurate and they account for the noise or errors to closely.
 
Non-Parametric Methods attempts to estimate to the data points as close as possible and typically performs better with more data.  Thin Plate Spline is one method for fitting the data.  It too can be overfit.
 
Another topic is Quantitative and Qualitative.
 
Quantitative involves numerical values as it has the word "quantit" to help remember.  These are Regression problems such as Least Squares Linear Regression.
 
Qualitative has Classes or Categories.  These classes are sometimes binary as in True/False, Yes/No, Male/Female, or Group1, Group2 or Group3.  These are Classification problems such as Logistic Regression.
 
The main takeaway is there is no one silver bullet to apply to every data set.  It's the responsibility of the analyst to decide which approach works best for a particular situation as results can vary.
 
For Regression problems the Mean Squared Error or MSE can determine the quality of the results.  It's useful for testing data rather than the training data.  The lower the MSE the better as fewer errors translates to more accuracy.
 
There are Qualitative which are the functions and parameters part of the equation.  The Irreducible Errors which are the downstream errors known as Epsilon.  Reducible Errors can be tweaked, Irreducible Errors can not.
 
One way to offset the Reducible Errors is to account for Bias or Variance.  Flexible models tend to have higher variance while inflexible models tend to have lower variance.  All regression models should contain some variance or errors or result in overfitting.
 
The Bayes Classifier is associated under the Classification spectrum which is based on Conditional Probability.  The segment of the chart where the probability is exactly 50% is known as the Bayes decision boundary and the lowest possible error is called rate is termed Based error rate similar to the irreducible error.  Since the classifier is based on classes, it always chooses the largest class.  Although this method is highly accurate, it's difficult to apply in real life scenarios.
 
K-Nearest Neighbors attempts to estimate the conditional distribution and then classify the highest estimated probability.  Although a simpler method, it is fairly accurate compared to the Bayes Classifier.
 
This blog attempts to summarize the course I'm attending online from Stanford Statistical Learning.  I'm paraphrasing and identifying the key concepts in an effort to organize and remember.  I use this technique to learn and self teach and in no way are these my original thoughts.  I'm reading from the assigned book for the course titled: "Springer Texts in Statistics" found here and they deserve all the credit!  The course can be found here which I highly recommend.

Stay tuned for more blog post and thanks for reading~!

Bloom Consulting Since Year 2000