MRF parameter learning
(1.1 hours to learn)
The parameters of a Markov random field (MRF) can be fit to data using maximum likelihood. The optimal parameters have an interesting interpretation: they are the parameters such that certain sufficient statistics of the model must match the corresponding statistics of the empirical distribution.
This concept has the prerequisites:
- Markov random fields
- maximum likelihood (Maximum likelihood is a criterion for learning MRF parameters.)
- inference in MRFs (Inference is a necessary part of MRF parameter learning.)
- optimization problems
- Consider the maximum likelihood objective function for learning the parameters of an MRF given fully observed data.
- Why doesn't the optimization problem decompose into separate optimization problems for each variable?
- Why is it hard even to compute the objective function?
- Derive the gradient of the objective function.
- By setting the gradient to zero, show that for the maximum likelihood parameters, the model statistics must match the data statistics.
- Performing gradient descent requires performing inference in the MRF. Which quantities need to be computed?
- Optional: show that the maximum likelihood optimization problem is convex (which implies there are no local optima). You may first want to read about covariance matrices and [convex optimization](convex_optimization) .
Core resources (read/watch one of the following)
→ Coursera: Probabilistic Graphical Models (2013)
An online course on probabilistic graphical models.
- Click on "Preview" to see the videos.
→ Probabilistic Graphical Models: Principles and Techniques
A very comprehensive textbook for a graduate-level course on probabilistic AI.
- Section 20.1, "Overview," pages 943-944
- Section 20.2, "The likelihood function," pages 944-949
- Section 20.3.1, "Maximum likelihood estimation," pages 949-950
- MRF parameter learning is a special case of maximum likelihood in exponential families .
- As with exponential families, maximum likelihood in MRFs can be interpreted in terms of maximum entropy .
- Some algorithms for learning the parameters:
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation