the curse of dimensionality

(1.2 hours to learn)

Summary

The curse of dimensionality refers to a collection of counterintuitive properties of high-dimensional spaces which make it difficult to learn using purely local algorithms such as K nearest neighbors.

Context

This concept has the prerequisites:

Goals

  • Understand the properties of high-dimensional spaces that make it difficult to learn based on nearest neighbors, e.g.
    • if points are sampled uniformly within a high-dimensional box, most distances are similar
    • most of the volume of a ball is concentrated near the boundary
    • uniformly sampled points are unlikely to have very close neighbors
  • If the input variables are sampled uniformly from a D-dimensional box, and one uses KNN as the learning algorithm, why can the number of training examples needed to achieve a given accuracy be exponential in D?

Core resources (read/watch one of the following)

-Free-

The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Authors: Trevor Hastie,Robert Tibshirani,Jerome Friedman
Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Author: Pedro Domingos
Other notes:
  • Click on "Preview" to see the videos.

See also