latent Dirichlet allocation
(55 minutes to learn)
Summary
Latent Dirichlet Allocation (LDA) is a probabilistic mixture of mixtures (or admixture) model for grouped data. It is most commonly used as a topic model, where the observed data is the words and the groups are the individual documents. In the LDA topic model, the observed data (words) within the groups (documents) are the result of probabilistically choosing words from a specific topic (multinomial over the vocabulary), where the topic is itself drawn from a document-specific multinomial that has a global Dirichlet prior.
Context
This concept has the prerequisites:
- probabilistic Latent Semantic Analysis (LDA is a Bayesian version of pLSA.)
- Bayesian parameter estimation: multinomial distribution (Bayesian parameter estimation with multinomials is a component of LDA.)
Core resources (read/watch one of the following)
-Free-
→ Topic Models
Location:
from 12:00 onwards including part 2
Other notes:
- part II goes into more technical detail while part 1 is a higher level overview
→ Latent Dirichlet Allocation (2003)
The research paper that introduced latent Dirichlet allocation.
Location:
Section 3 pgs. 996-999 provides the key description of LDA
-Paid-
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Section 27.3
See also
- Hierarchical Dirichlet processes are a generalization of LDA which can learn an unbounded number of topics.