# Bayes net parameter learning

(55 minutes to learn)

## Summary

The parameters of a Bayes net can be estimated using maximum likelihood. In the most general parameterization, when the data are fully observed, the ML estimation problem decomposes into independent subproblems associated with each CPT.

## Context

This concept has the prerequisites:

- maximum likelihood (Maximum likelihood is a simple way to learn Bayes net parameters.)
- Bayesian networks
- optimization problems (Finding the maximum likelihood solution requires solving an optimization problem.)

## Goals

- Know how to determine the maximum likelihood estimate for the parameters in a Bayes net when all of the variables are fully observed.

- In particular, understand why the problem decomposes into independent parameter learning subproblems associated with each CPT, and why the assumption of full observations is necessary.
- The decomposition into independent terms isn't just used for maximum likelihood estimation -- it's the basis behind a number of other algorithms for learning Bayes nets.

- How does the maximum likelihood solution change when parameters are shared between different CPTs?

## Core resources (read/watch one of the following)

## -Free-

→ Coursera: Probabilistic Graphical Models (2013)

An online course on probabilistic graphical models.

Other notes:

- Click on "Preview" to see the videos.

## -Paid-

→ Probabilistic Graphical Models: Principles and Techniques

A very comprehensive textbook for a graduate-level course on probabilistic AI.

Location:
Section 17.2, "MLE for Bayesian networks," pages 722-728

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Coursera: Machine Learning

An online machine learning course aimed at advanced undergraduates.

Location:
Lecture "Learning Bayesian networks"

Other notes:

- Click on "Preview" to see the videos.

## -Paid-

→ Artificial Intelligence: a Modern Approach

A textbook giving a broad overview of all of AI.

Location:
Section 20.2, "Learning with complete data," subsections "Maximum likelihood parameter learning: discrete models" and "Naive Bayes models," pages 716-718

## See also

- For any parameter learning problem, we care how well the learned parameters can generalize to new data .
- Bayesian parameter estimation is a way to avoid overfitting and incorporate prior knowledge.
- The expectation-maximization (EM) algorithm gives a way of dealing with missing entries.
- It's possible to learn the structure itself , i.e. which edges should be included.
- We can learn parameters for Markov random fields (MRFs) using similar principles.