# factor analysis

(1.7 hours to learn)

## Summary

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. In other words, it is possible, for example, that variations in three or four observed variables mainly reflect the variations in fewer unobserved variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus "error" terms. [from Wikipedia]

## Context

This concept has the prerequisites:

- multivariate distributions (Factor analysis defines a joint distribution.)
- conditional distributions (We are sometimes interested in the conditional distribution of the latent factors given the observed data, or vice versa.)
- computations on multivariate Gaussians (Factor analysis requires manipulating multivariate Gaussians.)
- Expectation-Maximization algorithm (We can use EM to learn the parameters.)
- maximum likelihood: multivariate Gaussians (The M step involves maximum likelihood for multivariate Gaussians.)

## Goals

- Understand the probabilistic interpretation of factor analysis and how the model is closely related to a number of other common probabilistic models used in machine learning.

- How can latent variables capture higher-order correlations? How does this apply to factor analysis?

## Core resources (read/watch one of the following)

## -Free-

→ Bayesian Reasoning and Machine Learning

A textbook for a graudate machine learning course.

## -Paid-

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

Location:
Section 12.1, pages 381-387

## Supplemental resources (the following are optional, but you may find them useful)

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 12.2.4, pages 583-586

Additional dependencies:

- probabilistic PCA

## See also

- Other models related to factor analysis:
- Principal component analysis (PCA) , which finds the maximum variance directions by solving an eigenvalue problem
- probabilistic PCA , a similar generative model, but where the noise covariance is spherical rather than diagonal
- probabilistic matrix factorization (PMF) , a Bayesian model for predicting missing entries of a matrix