generalization

(1.7 hours to learn)

Summary

When we fit a statistical model, we are interested in generalizing, i.e. making good predictions on data we haven't seen yet. We can fail at this in two ways: by underfitting (missing important structure in the data), or by overfitting (where the model is too sensitive to idiosyncrasies in the data). We can measure the generalization error of a model by training it on a ``training set'' and then evaluating it on a separate ``test set.'' Understanding the tradeoffs of model fit vs. complexity and how to measure generalization is key to getting any machine learning algorithm to work in practice.

Context

This concept has the prerequisites:

Core resources (read/watch one of the following)

-Free-

Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
Author: Andrew Y. Ng
Other notes:
  • Click on "Preview" to see the videos.

-Paid-

Supplemental resources (the following are optional, but you may find them useful)

-Free-

The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Authors: Trevor Hastie,Robert Tibshirani,Jerome Friedman

-Paid-

See also