# Expectation-Maximization algorithm

(2.1 hours to learn)

## Summary

Expectation-Maximization (EM) is an algorithm for maximum likelihood estimation in models with hidden variables (usually missing data or latent variables). It involves iteratively computing expectations of terms in the log-likelihood function under the current posterior, and then solving for the maximum likelihood parameters. Common applications include fitting mixture models, learning Bayes net parameters with latent data, and learning hidden Markov models.

## Context

This concept has the prerequisites:

- maximum likelihood (EM is a way of finding (approximate) maximum likelihood parameters.)
- mixture of Gaussians models (Mixture of Gaussians is the canonical example of EM.)
- maximum likelihood: multivariate Gaussians (Mixture of Gaussians is the canonical example of EM.)

## Core resources (read/watch one of the following)

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Sections 9.2.1-9.3.1, pages 432-443

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

Location:
Sections 11.3-11.4.2.4, pages 345-352

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Mathematical Monk: Machine Learning (2011)

Online videos on machine learning.

Other notes:

- The 16.4-16.5 sequence presents an EM justification by trying to analytically maximize the likelihood rather than the (more typical) justification that the EM algorithm maximizes a lower bound on the likelihood.

→ Bayesian Reasoning and Machine Learning

A textbook for a graudate machine learning course.

Location:
Sections 20.2-20.3.1, pages 402-409

Additional dependencies:

- KL divergence
- Lagrange multipliers

## -Paid-

→ Artificial Intelligence: a Modern Approach

A textbook giving a broad overview of all of AI.

Location:
Section 20.3, subsections "Unsupervised clustering: learning mixtures of Gaussians" (pages 725-727) and "The general form of the EM algorithm" (pages 731-732)

## See also

- Some examples of EM:
- Gaussian mixture models
- Factor analysis
- Baum-Welch algorithm for learning hidden Markov models

- Each step can be shown to increase the model likelihood .
- EM can be viewed as coordinate ascent on a lower bound of the data likelihood .