Bayes net structure learning
(3.1 hours to learn)
If the structure of a Bayes net (in particular, the set of edges) is not known, we may wish to learn it from data. This requires trading off the degree of fit with the complexity of the network. The Bayesian score gives a simple and efficient way of evaluating Bayes net structures.
This concept has the prerequisites:
- How are the requirements for structure learning different if your goal is knowledge discovery or density modeling?
- In particular, if the goal is density modeling, why might you want a graph which is sparser than the true one?
- Why isn't the maximum likelihood score appropriate for evaluating graph structures?
- Be able to derive the Bayesian (marginal likelihood) score for evaluating Bayes nets.
- What assumptions about the prior are needed for the solution to have a convenient closed form?
- The Bayesian score implicitly penalizes complex graphs. Why does this happen? (Hint: it's not the prior over graph structures, as you might expect!)
- Give an example where the graph is not identifiable, i.e. there are multiple graph structures which yield the same set of distributions.
- Give an example where the graph structure is identifiable.
Core resources (read/watch one of the following)
→ Probabilistic Graphical Models: Principles and Techniques
A very comprehensive textbook for a graduate-level course on probabilistic AI.
- Section 18.1, "Introduction" (of chapter 18, "Structure learning in Bayesian networks"), pages 783-785
- Section 18.3, "Structure scores," pages 790-807
Supplemental resources (the following are optional, but you may find them useful)
→ Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
- Click on "Preview" to see the videos.
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
- Section 26.1, "Introduction" (of chapter 26, "Graphical model structure learning"), pages 907-908
- Section 26.4, "Learning DAG structures," pages 914-922
- BDe priors are a kind of prior distribution which results in "fairer" comparisons between models of different complexities.
- Many heuristics have been developed for searching over Bayes net structures.
- If we restrict ourselves to trees, we can find the globally optimal structure under the maximum likelihood scoring criterion. This is known as Chow-Liu trees .
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation