# the curse of dimensionality

(1.2 hours to learn)

## Summary

The curse of dimensionality refers to a collection of counterintuitive properties of high-dimensional spaces which make it difficult to learn using purely local algorithms such as K nearest neighbors.

## Context

This concept has the prerequisites:

- K nearest neighbors (The curse of dimensionality is especially bad for nonparametric learning algorithms like KNN.)
- weak law of large numbers (The law of large numbers is needed to analyze the average distances between points in high dimensions.)

## Goals

- Understand the properties of high-dimensional spaces that make it difficult to learn based on nearest neighbors, e.g.
- if points are sampled uniformly within a high-dimensional box, most distances are similar
- most of the volume of a ball is concentrated near the boundary
- uniformly sampled points are unlikely to have very close neighbors

- If the input variables are sampled uniformly from a D-dimensional box, and one uses KNN as the learning algorithm, why can the number of training examples needed to achieve a given accuracy be exponential in D?

## Core resources (read/watch one of the following)

## -Free-

→ The Elements of Statistical Learning

A graudate-level statistical learning textbook with a focus on frequentist methods.

→ Coursera: Machine Learning

An online machine learning course aimed at advanced undergraduates.

Location:
Lecture "The curse of dimensionality"

Other notes:

- Click on "Preview" to see the videos.

## See also

- One way around the curse of dimensionality is to use parametric models such as linear regression .
- Another solution is to apply a dimensionality reduction algorithm such as principal component analysis (PCA) .