principal component analysis
(1.9 hours to learn)
Principal component analysis is a method for projecting data into a lower dimensional space. It works by finding the space which maximizes the variance of the projections, or equivalently, minimizes the reconstruction error. Mathematically, it corresponds to computing the SVD of the transformed data, or the spectral decomposition of the covariance matrix.
This concept has the prerequisites:
- covariance matrices (PCA is defined in terms of the covariance matrix.)
- spectral decomposition (PCA is defined in terms of the spectral decomposition of the covariance matrix.)
- singular value decomposition (PCA can be seen as the truncated SVD of the covariance matrix.)
Core resources (read/watch one of the following)
→ Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
Location: Lecture sequence "Dimensionality reduction"
- Click on "Preview" to see the videos.
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Chapter 12 introduction and Section 12.1, pages 559-570
- Lagrange multipliers
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location: Sections 12.2-12.2.1 (pages 387-389) and 12.2.3 (pages 392-395)
Supplemental resources (the following are optional, but you may find them useful)
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
- Lagrange multipliers
- Mathematical justification of PCA
- Some ways PCA is commonly used:
- to visualize datasets by projecting into a low-dimensional space
- as a preprocessing step for supervised learning; the idea is to improve generalization or computational efficiency by reducing the dimensionality of the inputs
- latent semantic analysis (LSA) , a way of uncovering topics from text
- probabilistic PCA , where the same algorithm is interpreted as fitting a generative model
- factor analysis , another related generative model where each input dimension can have a separate noise variance
- Bayesian PCA
- probabilistic matrix factorization (PMF) , a PCA-like model for predicing missing entries of a matrix
- kernel PCA , which implicitly maps the data to a high-dimensional space before computing the PCA vectors
- Fisher's linear discriminant is another projection similar to PCA, but which uses class labels also.
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation