latent semantic analysis
(60 minutes to learn)
Latent Semantic Analysis (LSA), or Latent Semantic Indexing (LSI), is a statistical technique typically used for analyzing relationships between a set of documents and the terms they contain. At its core, LSA performs singular value decomposition (SVD) on a term-by-document count matrix of a corpus and interprets the SVD factors as the "topics" of the documents. We can then use these resulting factors (topics) to determine the document-document, document-term, and term-term similarities in the given corpus.
This concept has the prerequisites:
- singular value decomposition (The SVD is one step of LSA.)
Core resources (read/watch one of the following)
→ Introduction to Information Retrieval
A textbook on information retrieval techniques.
Location: Section 18.4
- this section focuses on LSA from an information retrieval perspective, where it is referred to as Latent Semantic Indexing (LSI)
→ An Introduction to Latent Semantic Analysis
- a full understanding of SVD is not needed for this introduction
Supplemental resources (the following are optional, but you may find them useful)
- LSA has an odd interpretation when viewed as a probabilistic model. Probabilistic LSA is a similar model with a more principled probabilistic interpretation.
- LSA is exactly equivalent to applying principle component analysis to a term by document count matrix
- LSA is commonly used in information retrieval. There, it's referred to as latent semantic indexing .
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation