variational interpretation of EM
(50 minutes to learn)
The expectation-maximization (EM) algorithm can be interpreted as a coordinate ascent procedure which optimizes a variational lower bound on the likelihood function. This connects it with variational inference algorithms and justifies various generalizations and approximations to the algorithm.
This concept has the prerequisites:
- Expectation-Maximization algorithm
- maximum likelihood (We analyze EM as an algorithm for maximizing the likelihood.)
- KL divergence (KL divergence is part of the objective function in variational EM.)
- Jensen's inequality (Jensen's inequality is used to show that EM improves a lower bound on the likelihood.)
- optimization problems (EM is an optimization algorithm.)
Core resources (read/watch one of the following)
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Section 9.4, pages 450-455
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location: Section 11.4.7, pages 363-365
-No Additional Notes-
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation