deep belief networks
(50 minutes to learn)
Summary
Deep belief networks (DBNs) are a kind of deep, multilayer graphical model which contains both directed and undirected edges. The bottom layer represents the inputs, and the higher layers are meant to represent increasingly abstract features of the data. DBNs can be trained in a layerwise fashion, and are often used to initialize deep discriminative neural networks, a procedure known as generative pre-training.
Context
This concept has the prerequisites:
- Markov random fields (Part of a deep belief net is an MRF.)
- Bayesian networks (Part of a deep belief net is a Bayes net.)
- restricted Boltzmann machines (The top two layers of a DBN form an RBM, and DBNs can be trained by training a sequence of RBMs.)
Goals
- Know the graphical model structure of a DBN and understand what the combination of directed and undirected edges represents.
- Understand why the explaining away effect makes exact inference in a DBN intractable.
- Know how to train a DBN in a layerwise fashion.
- Optional: understand mathematically why layerwise training is guaranteed to improve the likelihood.
Core resources (read/watch one of the following)
-Free-
→ Learning deep architectures for AI (2009)
A review paper on deep learning techniques written by one of the leaders in the field.
Other notes:
- Skim chapters 3 and 4 for motivation
→ Coursera: Neural Networks for Machine Learning (2012)
An online course by Geoff Hinton, who invented many of the core ideas behind neural nets and deep learning.
Other notes:
- You may want to skim the lectures on learning sigmoid belief nets (Lecture13)
→ A fast learning algorithm for deep belief nets (2006)
The research paper which introduced layerwise training of DBNs.
Supplemental resources (the following are optional, but you may find them useful)
-Paid-
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Section 28.2, "Deep generative models," pages 995-998
See also
- Deep belief nets are commonly used for unsupervised pre-training , where one first trains a generative model, and uses it to initialize a discriminative model.
- Deep Boltzmann machines are another closely related deep architecture.