generalization
(1.7 hours to learn)
Summary
When we fit a statistical model, we are interested in generalizing, i.e. making good predictions on data we haven't seen yet. We can fail at this in two ways: by underfitting (missing important structure in the data), or by overfitting (where the model is too sensitive to idiosyncrasies in the data). We can measure the generalization error of a model by training it on a ``training set'' and then evaluating it on a separate ``test set.'' Understanding the tradeoffs of model fit vs. complexity and how to measure generalization is key to getting any machine learning algorithm to work in practice.
Context
This concept has the prerequisites:
- linear regression (Linear regression is an instructive example highlighting various issues of generalization.)
Core resources (read/watch one of the following)
-Free-
→ Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
Other notes:
- Click on "Preview" to see the videos.
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 1.1, pgs. 4-12
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
-Paid-
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Sections 1.4.7-1.4.8, pgs. 22-24
See also
- Some techniques for estimating generalization error include:
- Cross-validation , a simple and widely applicable technique
- The Akaike information criterion (for probabilistic models)
- The C_p statistic (for linear regression)
- For linear regression, generalization error can be determined analytically, and breaks down exactly into a sum of bias and variance terms ". This provides a useful intuition for other models as well.
- Probably Approximately Correct (PAC) learning , which analyzes whether an algorithm usually learns a good-enough model
- VC dimension , a quantity which characterizes the complexity of a continuously-parameterized model
- Structural risk minimization , a way of controlling overfitting by defining a nested sequence of models of increasing complexity