# K nearest neighbors

(4.1 hours to learn)

## Summary

K nearest neighbors is a very simple machine learning algorithm which simply averages the labels of the K nearest neighbors in the training set. It is a canonical example of a nonparametric learning algorithm. It has the advantage that it can learn arbitrarily complex functions, but it is especially sensitive to the curse of dimensionality.

## Context

This concept has the prerequisites:

- independent events (Independence is needed to analyze K nearest neighbors.)

## Goals

- Know what the K nearest neighbors algorithm is.

- Be aware of the tradeoffs of KNN vs. parameteric models, including:
- KNN requires no work at training time, but lots of work at test time
- KNN can learn arbitrarily complex functions
- KNN requires storing the entire training set
- KNN is especially prone to the curse of dimensionality

- Derive the asymptotic classification error of KNN, both with K = 1 and with K increasing.

## Core resources (read/watch one of the following)

## -Free-

→ Coursera: Machine Learning

An online machine learning course aimed at advanced undergraduates.

Other notes:

- Watch the Week One lectures if you're not familiar with the basic machine learning setup.
- Click on "Preview" to see the videos.

→ The Elements of Statistical Learning

A graudate-level statistical learning textbook with a focus on frequentist methods.

## Supplemental resources (the following are optional, but you may find them useful)

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 2.5.2, "Nearest-neighbor methods"

## See also

- KNN is especially prone to the curse of dimensionality .
- KNN is a canonical example of a nonparametric learning algorithm. Other examples include: