# restricted Boltzmann machines

(2.8 hours to learn)

## Summary

Restricted Boltzmann machines (RBMs) are a type of undirected graphical model typically used for learning binary feature representations. The structure consists of a bipartite graph with a layer of visible units to represent the inputs and a layer of hidden units to represent more abstract features. Training is intractable, but approximations such as contrastive divergence work well in practice. RBMs are a building block of many models in deep learning.

## Context

This concept has the prerequisites:

- Markov random fields (RBMs are a kind of MRF.)
- MRF parameter learning (Training RBMs is an instance of MRF parameter learning.)
- stochastic gradient descent (RBMs are trained with (approximate) stochastic gradient descent.)
- Gibbs sampling (Gibbs sampling is part of RBM training.)

## Goals

- Know what an RBM is and what distributions it can represent.

- Understand why training an RBM is intractable. In particular,
- why is it intractable to compute the gradient?
- why does the likelihood function have local optima?

- Know about the contrastive divergence training criterion and understand what approximation is being made.

- Why does the structure of the model simplify the Gibbs sampling update?

- Be able to implement an RBM training algorithm such as contrastive divergence.

## Core resources (read/watch one of the following)

## -Free-

→ Learning deep architectures for AI (2009)

A review paper on deep learning techniques written by one of the leaders in the field.

Other notes:

- Skim chapters 3 and 4 for motivation.

→ Coursera: Neural Networks for Machine Learning (2012)

An online course by Geoff Hinton, who invented many of the core ideas behind neural nets and deep learning.

Other notes:

- You may want to first skim the lectures on Boltzmann machines.

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ The Elements of Statistical Learning

A graudate-level statistical learning textbook with a focus on frequentist methods.

Location:
Section 17.4.4, pages 643-645

## -Paid-

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

Location:
Section 27.7, "Restricted Boltzmann machines," pages 983-993

## See also

- RBMs can be stacked to build deeper generative models, such as RBMs are often used in unsupervised pre-training , where one initializes a discriminative model from a generative one.
- Persistent contrastive divergence is a variant on CD which tends to learn better generative models.
- Hopfield networks are a classic neural net model of associative memory closely related to RBMs.