K nearest neighbors

(4.1 hours to learn)

Summary

K nearest neighbors is a very simple machine learning algorithm which simply averages the labels of the K nearest neighbors in the training set. It is a canonical example of a nonparametric learning algorithm. It has the advantage that it can learn arbitrarily complex functions, but it is especially sensitive to the curse of dimensionality.

Context

This concept has the prerequisites:

Goals

  • Know what the K nearest neighbors algorithm is.
  • Be aware of the tradeoffs of KNN vs. parameteric models, including:
    • KNN requires no work at training time, but lots of work at test time
    • KNN can learn arbitrarily complex functions
    • KNN requires storing the entire training set
    • KNN is especially prone to the curse of dimensionality
  • Derive the asymptotic classification error of KNN, both with K = 1 and with K increasing.

Core resources (read/watch one of the following)

-Free-

Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Author: Pedro Domingos
Other notes:
  • Watch the Week One lectures if you're not familiar with the basic machine learning setup.
  • Click on "Preview" to see the videos.
The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Authors: Trevor Hastie,Robert Tibshirani,Jerome Friedman

Supplemental resources (the following are optional, but you may find them useful)

-Paid-

See also