# Bayesian parameter estimation: multinomial distribution

(1.1 hours to learn)

## Summary

Suppose we observe a set of draws from a multinomial distribution with unknown parameters and we're trying to predict the distribution over subsequent draws. If we put a Dirichlet prior over the probabilities, we can analytically integrate out the parameters to get the posterior predictive distribution. This has a very simple form: adding fake counts and then normalizing. These ideas are used more generally in Bayesian models involving discrete variables.

## Context

This concept has the prerequisites:

- Bayesian parameter estimation (This is an example of Bayesian parameter estimation.)
- multinomial distribution

## Goals

- Know how the Dirichlet distribution is defined and what the parameters represent

- Know what the Dirichlet-multinomial model is

- Derive the posterior distribution and posterior predictive distribution

## Core resources (read/watch one of the following)

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 2.2, pgs. 74-77

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

Location:
Sections 2.5.4 (pages 47-49) and 3.4, (pages 78-82)

Additional dependencies:

- Lagrange multipliers

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Coursera: Probabilistic Graphical Models (2013)

An online course on probabilistic graphical models.

Additional dependencies:

- maximum likelihood
- Bayesian networks

Other notes:

- Click on "Preview" to see the videos.

## See also

- Some examples of models that use Dirichlet-multinomial distributions: The Chinese restaurant process is an analogue of the Dirichlet-multinomial distribution to infinitely many mixture components.