latent Dirichlet allocation

(55 minutes to learn)


Latent Dirichlet Allocation (LDA) is a probabilistic mixture of mixtures (or admixture) model for grouped data. It is most commonly used as a topic model, where the observed data is the words and the groups are the individual documents. In the LDA topic model, the observed data (words) within the groups (documents) are the result of probabilistically choosing words from a specific topic (multinomial over the vocabulary), where the topic is itself drawn from a document-specific multinomial that has a global Dirichlet prior.


This concept has the prerequisites:

Core resources (read/watch one of the following)


Topic Models
Location: from 12:00 onwards including part 2
Author: David Blei
Other notes:
  • part II goes into more technical detail while part 1 is a higher level overview
Latent Dirichlet Allocation (2003)
The research paper that introduced latent Dirichlet allocation.
Location: Section 3 pgs. 996-999 provides the key description of LDA
Authors: David Blei,Andrew Ng,Michael I Jordan


See also