# variational interpretation of EM

(50 minutes to learn)

## Summary

The expectation-maximization (EM) algorithm can be interpreted as a coordinate ascent procedure which optimizes a variational lower bound on the likelihood function. This connects it with variational inference algorithms and justifies various generalizations and approximations to the algorithm.

## Context

This concept has the prerequisites:

- Expectation-Maximization algorithm
- maximum likelihood (We analyze EM as an algorithm for maximizing the likelihood.)
- KL divergence (KL divergence is part of the objective function in variational EM.)
- Jensen's inequality (Jensen's inequality is used to show that EM improves a lower bound on the likelihood.)
- optimization problems (EM is an optimization algorithm.)

## Core resources (read/watch one of the following)

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 9.4, pages 450-455

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

Location:
Section 11.4.7, pages 363-365

## See also

-No Additional Notes-