Bayes net structure learning

(3.1 hours to learn)

Summary

If the structure of a Bayes net (in particular, the set of edges) is not known, we may wish to learn it from data. This requires trading off the degree of fit with the complexity of the network. The Bayesian score gives a simple and efficient way of evaluating Bayes net structures.

Context

This concept has the prerequisites:

Goals

  • How are the requirements for structure learning different if your goal is knowledge discovery or density modeling?
    • In particular, if the goal is density modeling, why might you want a graph which is sparser than the true one?
  • Why isn't the maximum likelihood score appropriate for evaluating graph structures?
  • Be able to derive the Bayesian (marginal likelihood) score for evaluating Bayes nets.
    • What assumptions about the prior are needed for the solution to have a convenient closed form?
    • The Bayesian score implicitly penalizes complex graphs. Why does this happen? (Hint: it's not the prior over graph structures, as you might expect!)
  • Give an example where the graph is not identifiable, i.e. there are multiple graph structures which yield the same set of distributions.
  • Give an example where the graph structure is identifiable.

Core resources (read/watch one of the following)

-Free-

Coursera: Probabilistic Graphical Models (2013)
An online course on probabilistic graphical models.
Author: Daphne Koller
Other notes:
  • Click on "Preview" to see the videos.

-Paid-

Supplemental resources (the following are optional, but you may find them useful)

-Free-

Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Author: Pedro Domingos
Other notes:
  • Click on "Preview" to see the videos.

-Paid-

See also

  • BDe priors are a kind of prior distribution which results in "fairer" comparisons between models of different complexities.
  • Many heuristics have been developed for searching over Bayes net structures.
  • If we restrict ourselves to trees, we can find the globally optimal structure under the maximum likelihood scoring criterion. This is known as Chow-Liu trees .