# Bayes net structure learning

(3.1 hours to learn)

## Summary

If the structure of a Bayes net (in particular, the set of edges) is not known, we may wish to learn it from data. This requires trading off the degree of fit with the complexity of the network. The Bayesian score gives a simple and efficient way of evaluating Bayes net structures.

## Context

This concept has the prerequisites:

- Bayesian estimation of Bayes net parameters (The same Bayesian analysis is used to derive the Bayesian scoring criterion.)
- Bayesian model comparison (The Bayesian scoring criterion is the Bayes factor for a particular model.)
- generalization (One desideratum for Bayes net structures is that they generalize well to new data.)
- gamma function (The closed form for the Bayesian score includes the Gamma function.)

## Goals

- How are the requirements for structure learning different if your goal is knowledge discovery or density modeling?
- In particular, if the goal is density modeling, why might you want a graph which is sparser than the true one?

- Why isn't the maximum likelihood score appropriate for evaluating graph structures?

- Be able to derive the Bayesian (marginal likelihood) score for evaluating Bayes nets.
- What assumptions about the prior are needed for the solution to have a convenient closed form?
- The Bayesian score implicitly penalizes complex graphs. Why does this happen? (Hint: it's not the prior over graph structures, as you might expect!)

- Give an example where the graph is not identifiable, i.e. there are multiple graph structures which yield the same set of distributions.

- Give an example where the graph structure is identifiable.

## Core resources (read/watch one of the following)

## -Free-

→ Coursera: Probabilistic Graphical Models (2013)

An online course on probabilistic graphical models.

Other notes:

- Click on "Preview" to see the videos.

## -Paid-

→ Probabilistic Graphical Models: Principles and Techniques

A very comprehensive textbook for a graduate-level course on probabilistic AI.

- Section 18.1, "Introduction" (of chapter 18, "Structure learning in Bayesian networks"), pages 783-785
- Section 18.3, "Structure scores," pages 790-807

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Coursera: Machine Learning

An online machine learning course aimed at advanced undergraduates.

Other notes:

- Click on "Preview" to see the videos.

## -Paid-

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

- Section 26.1, "Introduction" (of chapter 26, "Graphical model structure learning"), pages 907-908
- Section 26.4, "Learning DAG structures," pages 914-922

## See also

- BDe priors are a kind of prior distribution which results in "fairer" comparisons between models of different complexities.
- Many heuristics have been developed for searching over Bayes net structures.
- If we restrict ourselves to trees, we can find the globally optimal structure under the maximum likelihood scoring criterion. This is known as Chow-Liu trees .