PAC learning
(2 hours to learn)
Summary
Probably approximately correct (PAC) learning is a theoretical framework for analyzing the generalization error of a learning algorithm in terms of its error on a training set and some measure of complexity. The goal is typically to show that an algorithm achieves low generalization error with high probability.
Context
This concept has the prerequisites:
- generalization (PAC learning is a way of analyzing the generalization performance of learning algorithms.)
- unions of events (The union bound is an important tool for analyzing PAC learning.)
- independent events (The analysis assumes that the training examples are independent draws from the distribution.)
- Chernoff bounds (The Chernoff bounds are an important tool for analyzing PAC learning.)
Goals
- Know what it means for a hypothesis class to be PAC-learnable.
- Derive the bound on the probability of having generalization error greater than epsilon, in terms of the number of training examples and the size of the hypothesis class.
- Try rearranging the terms of the formula to view each variable as a function of the others. E.g., how many training examples do you need in order to guarantee a certain generalization accuracy?
- Carry out the analysis for a particular hypothesis class, such as boolean functions.
- Try to get a sense for how large hypothesis spaces are. E.g., what is the size of the hypothesis space for:
- classifying based on a single Boolean variable (from a set of D)
- all possible functions of D Boolean variables
Core resources (read/watch one of the following)
-Free-
→ Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Other notes:
- Click on "Preview" to see the videos.
→ Stanford's Machine Learning lecture notes
Lecture notes for Stanford's machine learning course, aimed at graduate and advanced undergraduate students.
-Paid-
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location:
Section 18.5, "Why learning works: computational learning theory," pages 668-673
→ An Introduction to Computational Learning Theory (1994)
A graduate level textbook on computational learning theory.
- Section 1.1, "A rectangle learning game," pages 1-6
- Section 1.2, "A general model," pages 6-16
- Section 1.3, "Learning Boolean conjunctions," pages 16-18
See also
- The Occam learning formalism justifies why simple classifiers should be expected to perform well.
- VC dimension is a measure of complexity which allows these results to be generalized to continuous hypothesis spaces.
- Some examples of PAC learning: PAC-Bayesian analysis allows us to derive PAC-style bounds for Bayesian classifiers