probabilistic Latent Semantic Analysis
(45 minutes to learn)
Probabilistic Latent Semantic Analysis (pLSA), also known as probabilistic Latent Semantic Indexing (pLSI), is a matrix decomposition technique for binary and count data, where one component of the data is conditionally independent of the other component given some unobserved factor. pLSA is most commonly used for document modeling, where the count data is the number of times a term appears in each document (forming an observed term by document count matrix), and the factors are interpreted as the latent/unobserved topics.
This concept has the prerequisites:
- Understand the difference between pLSA and LSA
- Why is pLSA considered a statistical model while LSA is not?
- What objective function does pLSA maximize in order to determine the decomposition?
- How would a trained pLSA model handle new documents? (see Blei et al.'s LDA paper)
Core resources (read/watch one of the following)
→ Bayesian Reasoning and Machine Learning
A textbook for a graudate machine learning course.
Location: Section 15.6.1 pgs. 323-325
- Expectation-Maximization algorithm
- presents the expectation-maximization algorithm for learning the matrix decomposition, which is the standard technique for learning the decomposition
→ Probabilistic Latent Semantic Indexing
- You can gloss over section 3 if you're not familiar with the expectation maximization algorithm
Supplemental resources (the following are optional, but you may find them useful)
→ Latent Dirichlet Allocation (2003)
The research paper that introduced latent Dirichlet allocation.
Location: Section 4.3 pgs. 1000-1001
- points out some of the weaknesses of pLSA
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation