PAC learning

(2 hours to learn)


Probably approximately correct (PAC) learning is a theoretical framework for analyzing the generalization error of a learning algorithm in terms of its error on a training set and some measure of complexity. The goal is typically to show that an algorithm achieves low generalization error with high probability.


This concept has the prerequisites:


  • Know what it means for a hypothesis class to be PAC-learnable.
  • Derive the bound on the probability of having generalization error greater than epsilon, in terms of the number of training examples and the size of the hypothesis class.
    • Try rearranging the terms of the formula to view each variable as a function of the others. E.g., how many training examples do you need in order to guarantee a certain generalization accuracy?
  • Carry out the analysis for a particular hypothesis class, such as boolean functions.
  • Try to get a sense for how large hypothesis spaces are. E.g., what is the size of the hypothesis space for:
    • classifying based on a single Boolean variable (from a set of D)
    • all possible functions of D Boolean variables

Core resources (read/watch one of the following)


Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Author: Pedro Domingos
Other notes:
  • Click on "Preview" to see the videos.
Stanford's Machine Learning lecture notes
Lecture notes for Stanford's machine learning course, aimed at graduate and advanced undergraduate students.
Author: Andrew Y. Ng


See also