MRF parameter learning
(1.1 hours to learn)
Summary
The parameters of a Markov random field (MRF) can be fit to data using maximum likelihood. The optimal parameters have an interesting interpretation: they are the parameters such that certain sufficient statistics of the model must match the corresponding statistics of the empirical distribution.
Context
This concept has the prerequisites:
- Markov random fields
- maximum likelihood (Maximum likelihood is a criterion for learning MRF parameters.)
- inference in MRFs (Inference is a necessary part of MRF parameter learning.)
- optimization problems
Goals
- Consider the maximum likelihood objective function for learning the parameters of an MRF given fully observed data.
- Why doesn't the optimization problem decompose into separate optimization problems for each variable?
- Why is it hard even to compute the objective function?
- Derive the gradient of the objective function.
- By setting the gradient to zero, show that for the maximum likelihood parameters, the model statistics must match the data statistics.
- Performing gradient descent requires performing inference in the MRF. Which quantities need to be computed?
- Optional: show that the maximum likelihood optimization problem is convex (which implies there are no local optima). You may first want to read about covariance matrices and [convex optimization](convex_optimization) .
Core resources (read/watch one of the following)
-Free-
→ Coursera: Probabilistic Graphical Models (2013)
An online course on probabilistic graphical models.
Other notes:
- Click on "Preview" to see the videos.
-Paid-
→ Probabilistic Graphical Models: Principles and Techniques
A very comprehensive textbook for a graduate-level course on probabilistic AI.
- Section 20.1, "Overview," pages 943-944
- Section 20.2, "The likelihood function," pages 944-949
- Section 20.3.1, "Maximum likelihood estimation," pages 949-950
See also
- MRF parameter learning is a special case of maximum likelihood in exponential families .
- As with exponential families, maximum likelihood in MRFs can be interpreted in terms of maximum entropy .
- Some algorithms for learning the parameters: