restricted Boltzmann machines
(2.8 hours to learn)
Summary
Restricted Boltzmann machines (RBMs) are a type of undirected graphical model typically used for learning binary feature representations. The structure consists of a bipartite graph with a layer of visible units to represent the inputs and a layer of hidden units to represent more abstract features. Training is intractable, but approximations such as contrastive divergence work well in practice. RBMs are a building block of many models in deep learning.
Context
This concept has the prerequisites:
- Markov random fields (RBMs are a kind of MRF.)
- MRF parameter learning (Training RBMs is an instance of MRF parameter learning.)
- stochastic gradient descent (RBMs are trained with (approximate) stochastic gradient descent.)
- Gibbs sampling (Gibbs sampling is part of RBM training.)
Goals
- Know what an RBM is and what distributions it can represent.
- Understand why training an RBM is intractable. In particular,
- why is it intractable to compute the gradient?
- why does the likelihood function have local optima?
- Know about the contrastive divergence training criterion and understand what approximation is being made.
- Why does the structure of the model simplify the Gibbs sampling update?
- Be able to implement an RBM training algorithm such as contrastive divergence.
Core resources (read/watch one of the following)
-Free-
→ Learning deep architectures for AI (2009)
A review paper on deep learning techniques written by one of the leaders in the field.
Other notes:
- Skim chapters 3 and 4 for motivation.
→ Coursera: Neural Networks for Machine Learning (2012)
An online course by Geoff Hinton, who invented many of the core ideas behind neural nets and deep learning.
Other notes:
- You may want to first skim the lectures on Boltzmann machines.
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Location:
Section 17.4.4, pages 643-645
-Paid-
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Section 27.7, "Restricted Boltzmann machines," pages 983-993
See also
- RBMs can be stacked to build deeper generative models, such as RBMs are often used in unsupervised pre-training , where one initializes a discriminative model from a generative one.
- Persistent contrastive divergence is a variant on CD which tends to learn better generative models.
- Hopfield networks are a classic neural net model of associative memory closely related to RBMs.