principal component analysis
(1.9 hours to learn)
Summary
Principal component analysis is a method for projecting data into a lower dimensional space. It works by finding the space which maximizes the variance of the projections, or equivalently, minimizes the reconstruction error. Mathematically, it corresponds to computing the SVD of the transformed data, or the spectral decomposition of the covariance matrix.
Context
This concept has the prerequisites:
- covariance matrices (PCA is defined in terms of the covariance matrix.)
- spectral decomposition (PCA is defined in terms of the spectral decomposition of the covariance matrix.)
- singular value decomposition (PCA can be seen as the truncated SVD of the covariance matrix.)
Core resources (read/watch one of the following)
-Free-
→ Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
Location:
Lecture sequence "Dimensionality reduction"
Other notes:
- Click on "Preview" to see the videos.
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Chapter 12 introduction and Section 12.1, pages 559-570
Additional dependencies:
- Lagrange multipliers
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Sections 12.2-12.2.1 (pages 387-389) and 12.2.3 (pages 392-395)
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Additional dependencies:
- Lagrange multipliers
See also
- Mathematical justification of PCA
- Some ways PCA is commonly used:
- to visualize datasets by projecting into a low-dimensional space
- as a preprocessing step for supervised learning; the idea is to improve generalization or computational efficiency by reducing the dimensionality of the inputs
- latent semantic analysis (LSA) , a way of uncovering topics from text
- probabilistic PCA , where the same algorithm is interpreted as fitting a generative model
- factor analysis , another related generative model where each input dimension can have a separate noise variance
- Bayesian PCA
- probabilistic matrix factorization (PMF) , a PCA-like model for predicing missing entries of a matrix
- kernel PCA , which implicitly maps the data to a high-dimensional space before computing the PCA vectors
- Fisher's linear discriminant is another projection similar to PCA, but which uses class labels also.