# independent component analysis

(1.1 hours to learn)

## Summary

Independent component analysis (ICA) is a latent variable model where the observations are modeled as linear combinations of latent variables which are usually drawn from a heavy-tailed distribution. Common uses include source separation and sparse dictionary learning.

## Context

This concept has the prerequisites:

- heavy-tailed distributions (ICA typically involves fitting heavy-tailed distributions.)
- principal component analysis (PCA is often used as a preprocessing step, and it's useful to compare ICA with PCA.)
- maximum likelihood (ICA is usually fit using maximum likelihood.)
- determinant (The maximum likelihood objective function includes a determinant.)
- orthonormal bases (Standard ICA has an orthogonality constraint.)
- multivariate Gaussian distribution (ICA requires that the component distributions be non-Gaussian.)
- optimization problems (Fitting ICA requires solving an optimization problem.)

## Core resources (read/watch one of the following)

## -Free-

→ Information Theory, Inference, and Learning Algorithms

A graudate-level textbook on machine learning and information theory.

→ Stanford's Machine Learning lecture notes

Lecture notes for Stanford's machine learning course, aimed at graduate and advanced undergraduate students.

## -Paid-

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

Location:
Sections 12.6-12.6.1, pages 407-411

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ The Elements of Statistical Learning

A graudate-level statistical learning textbook with a focus on frequentist methods.

Additional dependencies:

- differential entropy

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 12.4.1, pages 591-592

## See also

- Some other techniques for learning meaningful representations of data:
- manifold learning , where we try to embed points in a low-dimensional space where similar points are close together
- sparse coding , a generative model similar to ICA, but which gives an overcomplete representation (i.e. larger than the input representation)