variational interpretation of EM
(50 minutes to learn)
Summary
The expectation-maximization (EM) algorithm can be interpreted as a coordinate ascent procedure which optimizes a variational lower bound on the likelihood function. This connects it with variational inference algorithms and justifies various generalizations and approximations to the algorithm.
Context
This concept has the prerequisites:
- Expectation-Maximization algorithm
- maximum likelihood (We analyze EM as an algorithm for maximizing the likelihood.)
- KL divergence (KL divergence is part of the objective function in variational EM.)
- Jensen's inequality (Jensen's inequality is used to show that EM improves a lower bound on the likelihood.)
- optimization problems (EM is an optimization algorithm.)
Core resources (read/watch one of the following)
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 9.4, pages 450-455
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Section 11.4.7, pages 363-365
See also
-No Additional Notes-