Bayesian parameter estimation

(1.8 hours to learn)


In the Bayesian framework, we treat the parameters of a statistical model as random variables. The model is specified by a prior distribution over the values of the variables, as well as an evidence model which determines how the parameters influence the observed data. When we condition on the observations, we get the posterior distribution over parameters. The term ``Bayesian parameter estimation'' is deceptive, because often we can skip the parameter estimation step entirely. Rather, we integrate out the parameters and directly make predictions about future observables.


This concept has the prerequisites:


  • Know what the terms "prior" and "likelihood function" refer to
  • Be able to compute the posterior distribution using Bayes' Rule
  • Know what the posterior predictive distribution is and how to compute it analytically for a simple example (e.g. a beta-Bernoulli model)
  • What is a conjugate prior, and why is it useful?
  • Why can the posterior distribution be given in terms of pseudocounts when a conjugate prior is used?
  • What is the maximum a-posteriori (MAP) approximation? Give an example where the predictions differ between the MAP parameters and the posterior predictive distribution.

Core resources (read/watch one of the following)


Bayesian Reasoning and Machine Learning
A textbook for a graudate machine learning course.
Author: David Barber
Coursera: Probabilistic Graphical Models (2013)
An online course on probabilistic graphical models.
Author: Daphne Koller
Additional dependencies:
  • maximum likelihood
Other notes:
  • If you're not familiar with Bayes nets, don't worry: most of these lectures don't depend on them.
  • Click on "Preview" to see the videos.


Supplemental resources (the following are optional, but you may find them useful)


Coursera: Neural Networks for Machine Learning (2012)
An online course by Geoff Hinton, who invented many of the core ideas behind neural nets and deep learning.
Location: Lecture, "Introduction to the full Bayesian approach"
Author: Geoffrey E. Hinton


See also