K nearest neighbors
(4.1 hours to learn)
Summary
K nearest neighbors is a very simple machine learning algorithm which simply averages the labels of the K nearest neighbors in the training set. It is a canonical example of a nonparametric learning algorithm. It has the advantage that it can learn arbitrarily complex functions, but it is especially sensitive to the curse of dimensionality.
Context
This concept has the prerequisites:
- independent events (Independence is needed to analyze K nearest neighbors.)
Goals
- Know what the K nearest neighbors algorithm is.
- Be aware of the tradeoffs of KNN vs. parameteric models, including:
- KNN requires no work at training time, but lots of work at test time
- KNN can learn arbitrarily complex functions
- KNN requires storing the entire training set
- KNN is especially prone to the curse of dimensionality
- Derive the asymptotic classification error of KNN, both with K = 1 and with K increasing.
Core resources (read/watch one of the following)
-Free-
→ Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Other notes:
- Watch the Week One lectures if you're not familiar with the basic machine learning setup.
- Click on "Preview" to see the videos.
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Supplemental resources (the following are optional, but you may find them useful)
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 2.5.2, "Nearest-neighbor methods"
See also
- KNN is especially prone to the curse of dimensionality .
- KNN is a canonical example of a nonparametric learning algorithm. Other examples include: