# generalization

(1.7 hours to learn)

## Summary

When we fit a statistical model, we are interested in generalizing, i.e. making good predictions on data we haven't seen yet. We can fail at this in two ways: by underfitting (missing important structure in the data), or by overfitting (where the model is too sensitive to idiosyncrasies in the data). We can measure the generalization error of a model by training it on a ``training set'' and then evaluating it on a separate ``test set.'' Understanding the tradeoffs of model fit vs. complexity and how to measure generalization is key to getting any machine learning algorithm to work in practice.

## Context

This concept has the prerequisites:

- linear regression (Linear regression is an instructive example highlighting various issues of generalization.)

## Core resources (read/watch one of the following)

## -Free-

→ Coursera: Machine Learning (2013)

An online machine learning course aimed at a broad audience.

Other notes:

- Click on "Preview" to see the videos.

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 1.1, pgs. 4-12

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ The Elements of Statistical Learning

A graudate-level statistical learning textbook with a focus on frequentist methods.

## -Paid-

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

Location:
Sections 1.4.7-1.4.8, pgs. 22-24

## See also

- Some techniques for estimating generalization error include:
- Cross-validation , a simple and widely applicable technique
- The Akaike information criterion (for probabilistic models)
- The C_p statistic (for linear regression)

- For linear regression, generalization error can be determined analytically, and breaks down exactly into a sum of bias and variance terms ". This provides a useful intuition for other models as well.
- Probably Approximately Correct (PAC) learning , which analyzes whether an algorithm usually learns a good-enough model
- VC dimension , a quantity which characterizes the complexity of a continuously-parameterized model
- Structural risk minimization , a way of controlling overfitting by defining a nested sequence of models of increasing complexity