# MRF parameter learning

(1.1 hours to learn)

## Summary

The parameters of a Markov random field (MRF) can be fit to data using maximum likelihood. The optimal parameters have an interesting interpretation: they are the parameters such that certain sufficient statistics of the model must match the corresponding statistics of the empirical distribution.

## Context

This concept has the prerequisites:

- Markov random fields
- maximum likelihood (Maximum likelihood is a criterion for learning MRF parameters.)
- inference in MRFs (Inference is a necessary part of MRF parameter learning.)
- optimization problems

## Goals

- Consider the maximum likelihood objective function for learning the parameters of an MRF given fully observed data.
- Why doesn't the optimization problem decompose into separate optimization problems for each variable?
- Why is it hard even to compute the objective function?
- Derive the gradient of the objective function.
- By setting the gradient to zero, show that for the maximum likelihood parameters, the model statistics must match the data statistics.

- Performing gradient descent requires performing inference in the MRF. Which quantities need to be computed?

- Optional: show that the maximum likelihood optimization problem is convex (which implies there are no local optima). You may first want to read about covariance matrices and [convex optimization](convex_optimization) .

## Core resources (read/watch one of the following)

## -Free-

→ Coursera: Probabilistic Graphical Models (2013)

An online course on probabilistic graphical models.

Other notes:

- Click on "Preview" to see the videos.

## -Paid-

→ Probabilistic Graphical Models: Principles and Techniques

A very comprehensive textbook for a graduate-level course on probabilistic AI.

- Section 20.1, "Overview," pages 943-944
- Section 20.2, "The likelihood function," pages 944-949
- Section 20.3.1, "Maximum likelihood estimation," pages 949-950

## See also

- MRF parameter learning is a special case of maximum likelihood in exponential families .
- As with exponential families, maximum likelihood in MRFs can be interpreted in terms of maximum entropy .

- Some algorithms for learning the parameters: