(2.1 hours to learn)
Expectation-Maximization (EM) is an algorithm for maximum likelihood estimation in models with hidden variables (usually missing data or latent variables). It involves iteratively computing expectations of terms in the log-likelihood function under the current posterior, and then solving for the maximum likelihood parameters. Common applications include fitting mixture models, learning Bayes net parameters with latent data, and learning hidden Markov models.
This concept has the prerequisites:
- maximum likelihood (EM is a way of finding (approximate) maximum likelihood parameters.)
- mixture of Gaussians models (Mixture of Gaussians is the canonical example of EM.)
- maximum likelihood: multivariate Gaussians (Mixture of Gaussians is the canonical example of EM.)
Core resources (read/watch one of the following)
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Sections 9.2.1-9.3.1, pages 432-443
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location: Sections 11.3-22.214.171.124, pages 345-352
Supplemental resources (the following are optional, but you may find them useful)
→ Mathematical Monk: Machine Learning (2011)
Online videos on machine learning.
- The 16.4-16.5 sequence presents an EM justification by trying to analytically maximize the likelihood rather than the (more typical) justification that the EM algorithm maximizes a lower bound on the likelihood.
→ Bayesian Reasoning and Machine Learning
A textbook for a graudate machine learning course.
Location: Sections 20.2-20.3.1, pages 402-409
- KL divergence
- Lagrange multipliers
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location: Section 20.3, subsections "Unsupervised clustering: learning mixtures of Gaussians" (pages 725-727) and "The general form of the EM algorithm" (pages 731-732)
- Some examples of EM:
- Gaussian mixture models
- Factor analysis
- Baum-Welch algorithm for learning hidden Markov models
- Each step can be shown to increase the model likelihood .
- EM can be viewed as coordinate ascent on a lower bound of the data likelihood .
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation