This content of this roadmap follows Prof. Jordan's lectures/textbook. I'll update it as I/we work through the material. Send me an email if you'd like to contribute (colorado AT berkeley DOT edu)
Conditional Independence and Factorization
- Much of our early discussion focused on conditional independence in the context of directed graphical models (Bayes nets) and undirected graphical models (Markov random fields - MRFs)
- We can use the Bayes Ball algorithm to determine conditional independencies in Bayes nets.
- We can use simple reachability algorithms to determine conditional independencies in MRFs
- We briefly discussed factor graphs, which provide a more fine-grained representation of the independencies in a MRF
Exact Inference
- The variable elimination algorithm is based on interchanging sums and products in the definitions of marginals or partition functions.
- The sum product algorithm is a belief propagation algorithm based on dynamic programming. It has the advantage over naive variable elimination in that it reuses computations to compute marginals for all nodes in the graph
- junction trees generalize the sum product algorithm to arbitrary graphs by grouping variables together into cliques such that the cliques form a tree.
Sampling-based inference
- rejection sampling is a Monte Carlo method for sampling from a potentially complex distribution p(x) given a simpler distribution q(x)
- importance sampling is a way of estimating expectations under an intractable distribution p by sampling from a tractable distribution q and reweighting the samples according to the ratio of the probabilities
- We discussed some standard Markov chain Monte Carlo methods:
- Metropolis-Hastings algorithm is a very general method for approximately sampling from a distribution p by defining a Markov chain which has p as a stationary distribution
- Gibbs sampling is an MCMC algorithm where each random variable is iteratively resampled from its conditional distribution given the remaining variables -- it can be viewed as a special case of Metropolis-Hastings
- we also touched on determining MCMC convergence
- we can use simulated annealing to determine a MAP estimates
- Some fancier MCMC algorithms we touched on include:
Statistical Concepts
We discussed Bayesian vs frequentist inference; some topics we touched on include:
- maximum likelihood and asymptotics of maximum likelihood and the various prerequisites for these concepts
- bias-variance decomposition
- Bayesian model averaging
Linear Regression and the Least Mean Squares algorithm
- We discussed linear regression and its closed form solution (i.e. the normal equations) and their various prerequisites
- We also discussed the least mean squares algorithm and its interpretation as stochastic gradient descent
Linear Classifiers
Note: I'll fill out this section as I read through the material and we discuss it in class more thoroughly
- naive Bayes
- logistic regression
- fitting logistic regression with iterative reweighted least squares
- probit regression
Generalized linear models and Generative vs. Discriminative models
Note: I'll fill out this section as I read through the material and we discuss it in class more thoroughly