Bayesian parameter estimation: multinomial distribution
(1.1 hours to learn)
Summary
Suppose we observe a set of draws from a multinomial distribution with unknown parameters and we're trying to predict the distribution over subsequent draws. If we put a Dirichlet prior over the probabilities, we can analytically integrate out the parameters to get the posterior predictive distribution. This has a very simple form: adding fake counts and then normalizing. These ideas are used more generally in Bayesian models involving discrete variables.
Context
This concept has the prerequisites:
- Bayesian parameter estimation (This is an example of Bayesian parameter estimation.)
- multinomial distribution
Goals
- Know how the Dirichlet distribution is defined and what the parameters represent
- Know what the Dirichlet-multinomial model is
- Derive the posterior distribution and posterior predictive distribution
Core resources (read/watch one of the following)
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 2.2, pgs. 74-77
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Sections 2.5.4 (pages 47-49) and 3.4, (pages 78-82)
Additional dependencies:
- Lagrange multipliers
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ Coursera: Probabilistic Graphical Models (2013)
An online course on probabilistic graphical models.
Additional dependencies:
- maximum likelihood
- Bayesian networks
Other notes:
- Click on "Preview" to see the videos.
See also
- Some examples of models that use Dirichlet-multinomial distributions: The Chinese restaurant process is an analogue of the Dirichlet-multinomial distribution to infinitely many mixture components.