the curse of dimensionality
(1.2 hours to learn)
The curse of dimensionality refers to a collection of counterintuitive properties of high-dimensional spaces which make it difficult to learn using purely local algorithms such as K nearest neighbors.
This concept has the prerequisites:
- K nearest neighbors (The curse of dimensionality is especially bad for nonparametric learning algorithms like KNN.)
- weak law of large numbers (The law of large numbers is needed to analyze the average distances between points in high dimensions.)
- Understand the properties of high-dimensional spaces that make it difficult to learn based on nearest neighbors, e.g.
- if points are sampled uniformly within a high-dimensional box, most distances are similar
- most of the volume of a ball is concentrated near the boundary
- uniformly sampled points are unlikely to have very close neighbors
- If the input variables are sampled uniformly from a D-dimensional box, and one uses KNN as the learning algorithm, why can the number of training examples needed to achieve a given accuracy be exponential in D?
Core resources (read/watch one of the following)
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
→ Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Location: Lecture "The curse of dimensionality"
- Click on "Preview" to see the videos.
- One way around the curse of dimensionality is to use parametric models such as linear regression .
- Another solution is to apply a dimensionality reduction algorithm such as principal component analysis (PCA) .
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation