# latent semantic analysis

(60 minutes to learn)

## Summary

Latent Semantic Analysis (LSA), or Latent Semantic Indexing (LSI), is a statistical technique typically used for analyzing relationships between a set of documents and the terms they contain. At its core, LSA performs singular value decomposition (SVD) on a term-by-document count matrix of a corpus and interprets the SVD factors as the "topics" of the documents. We can then use these resulting factors (topics) to determine the document-document, document-term, and term-term similarities in the given corpus.

## Context

This concept has the prerequisites:

- singular value decomposition (The SVD is one step of LSA.)

## Core resources (read/watch one of the following)

## -Free-

→ Introduction to Information Retrieval

A textbook on information retrieval techniques.

Location:
Section 18.4

Other notes:

- this section focuses on LSA from an information retrieval perspective, where it is referred to as Latent Semantic Indexing (LSI)

→ An Introduction to Latent Semantic Analysis

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Wikipedia

## See also

- LSA has an odd interpretation when viewed as a probabilistic model. Probabilistic LSA is a similar model with a more principled probabilistic interpretation.
- LSA is exactly equivalent to applying principle component analysis to a term by document count matrix
- LSA is commonly used in information retrieval. There, it's referred to as latent semantic indexing .