latent Dirichlet allocation
(55 minutes to learn)
Latent Dirichlet Allocation (LDA) is a probabilistic mixture of mixtures (or admixture) model for grouped data. It is most commonly used as a topic model, where the observed data is the words and the groups are the individual documents. In the LDA topic model, the observed data (words) within the groups (documents) are the result of probabilistically choosing words from a specific topic (multinomial over the vocabulary), where the topic is itself drawn from a document-specific multinomial that has a global Dirichlet prior.
This concept has the prerequisites:
Core resources (read/watch one of the following)
→ Topic Models
Location: from 12:00 onwards including part 2
- part II goes into more technical detail while part 1 is a higher level overview
→ Latent Dirichlet Allocation (2003)
The research paper that introduced latent Dirichlet allocation.
Location: Section 3 pgs. 996-999 provides the key description of LDA
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location: Section 27.3
- Hierarchical Dirichlet processes are a generalization of LDA which can learn an unbounded number of topics.
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation