independent component analysis
(1.1 hours to learn)
Summary
Independent component analysis (ICA) is a latent variable model where the observations are modeled as linear combinations of latent variables which are usually drawn from a heavy-tailed distribution. Common uses include source separation and sparse dictionary learning.
Context
This concept has the prerequisites:
- heavy-tailed distributions (ICA typically involves fitting heavy-tailed distributions.)
- principal component analysis (PCA is often used as a preprocessing step, and it's useful to compare ICA with PCA.)
- maximum likelihood (ICA is usually fit using maximum likelihood.)
- determinant (The maximum likelihood objective function includes a determinant.)
- orthonormal bases (Standard ICA has an orthogonality constraint.)
- multivariate Gaussian distribution (ICA requires that the component distributions be non-Gaussian.)
- optimization problems (Fitting ICA requires solving an optimization problem.)
Core resources (read/watch one of the following)
-Free-
→ Information Theory, Inference, and Learning Algorithms
A graudate-level textbook on machine learning and information theory.
→ Stanford's Machine Learning lecture notes
Lecture notes for Stanford's machine learning course, aimed at graduate and advanced undergraduate students.
-Paid-
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Sections 12.6-12.6.1, pages 407-411
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Additional dependencies:
- differential entropy
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 12.4.1, pages 591-592
See also
- Some other techniques for learning meaningful representations of data:
- manifold learning , where we try to embed points in a low-dimensional space where similar points are close together
- sparse coding , a generative model similar to ICA, but which gives an overcomplete representation (i.e. larger than the input representation)