latent Dirichlet allocation
(55 minutes to learn)
Latent Dirichlet Allocation (LDA) is a probabilistic mixture of mixtures (or admixture) model for grouped data. It is most commonly used as a topic model, where the observed data is the words and the groups are the individual documents. In the LDA topic model, the observed data (words) within the groups (documents) are the result of probabilistically choosing words from a specific topic (multinomial over the vocabulary), where the topic is itself drawn from a document-specific multinomial that has a global Dirichlet prior.
This concept has the prerequisites:
Core resources (read/watch one of the following)
→ Topic Models
Location: from 12:00 onwards including part 2
- part II goes into more technical detail while part 1 is a higher level overview
→ Latent Dirichlet Allocation (2003)
The research paper that introduced latent Dirichlet allocation.
Location: Section 3 pgs. 996-999 provides the key description of LDA
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location: Section 27.3
- Hierarchical Dirichlet processes are a generalization of LDA which can learn an unbounded number of topics.