**Inference**attempts to understand the relationship between the predictors and the results. If we send in a value of 10 for parameter 1, the result is something. If we send in a value of 20, the results is something else.

**Prediction**attempts to fit the model such that the relationship between the predictors and the results can identify future accurate results.

Both of these are included in

**Supervised Learning**which typically have a Predictor and a Response

**.**

*Linear Regression*and

*Logistic Regression*are classic version. Newer techniques include

*GAM*,

*Bootstrapping*and

*Support Vector Machines*.

The alternate approach is known as

**Unsupervised Learning**. UL also has Predictors but no Response. Basically it attempts to organize the data into buckets to understand relationships and patterns known as Clustering or

*Cluster Analysis*.

There is actually another approach known as

**Semi-Structured Learning**which combines the two.

Another point of interest is the differentiation between

**Flexibility**and

**Interpretability**.

Some methods are

**restrictive**and

**inflexible**, yet they are easy to

**interpret**such as Least Squares or Lasso. These are typically associate with Inference and Linear Models.

Opposite methods are

**flexible**like thin plate splines yet more

**difficult to interpret**. Flexible models are associate with Splines and Boosting methods and seeing the relationship between predictor and results is rather difficult.

**Parametric Methods**have a two step approach: 1. assume the relationship of data points is linear 2. apply a procedure to

**fit**or

**train**the model using training data. One possible affect is

**overfitting**the model when the results are

*too accurate*and they account for the

*noise*or

*errors*to closely.

**Non-Parametric Methods**attempts to estimate to the data points as close as possible and typically performs better with more data. Thin Plate Spline is one method for fitting the data. It too can be

**overfit**.

Another topic is Quantitative and Qualitative.

**Quantitative**involves numerical values as it has the word "quantit" to help remember. These are

**Regression**problems such as Least Squares Linear Regression.

**Qualitative**has Classes or Categories. These classes are sometimes

**binary**as in True/False, Yes/No, Male/Female, or Group1, Group2 or Group3. These are

**Classification**problems such as Logistic Regression.

The main takeaway is there is no one silver bullet to apply to every data set. It's the responsibility of the analyst to decide which approach works best for a particular situation as results can vary.

For Regression problems the

**Mean Squared Error or MSE**can determine the quality of the results. It's useful for testing data rather than the training data. The lower the MSE the better as fewer errors translates to more accuracy.

There are Qualitative which are the functions and parameters part of the equation. The

**Irreducible Errors**which are the downstream errors known as Epsilon.

**Reducible Errors**can be tweaked,

**Irreducible Errors**can not.

One way to offset the

**Reducible**

**Errors**is to account for

**Bias**or

**Variance.**Flexible models tend to have higher variance while inflexible models tend to have lower variance. All regression models should contain some variance or errors or result in overfitting.

**The Bayes Classifier**is associated under the Classification spectrum which is based on

**Conditional**

**Probability**. The segment of the chart where the probability is exactly 50% is known as the

**Bayes decision boundary**and the lowest possible error is called rate is termed

**Based error rate**similar to the

*irreducible error*. Since the classifier is based on classes, it always chooses the largest class. Although this method is highly accurate, it's difficult to apply in real life scenarios.

**K-Nearest Neighbors**attempts to estimate the conditional distribution and then classify the highest estimated probability. Although a simpler method, it is fairly accurate compared to the Bayes Classifier.

This blog attempts to summarize the course I'm attending online from Stanford Statistical Learning. I'm paraphrasing and identifying the key concepts in an effort to organize and remember. I use this technique to learn and self teach and in no way are these my original thoughts. I'm reading from the assigned book for the course titled: "Springer Texts in Statistics" found here and they deserve all the credit! The course can be found here which I highly recommend.

Stay tuned for more blog post and thanks for reading~!