K nearest neighbors
(4.1 hours to learn)
K nearest neighbors is a very simple machine learning algorithm which simply averages the labels of the K nearest neighbors in the training set. It is a canonical example of a nonparametric learning algorithm. It has the advantage that it can learn arbitrarily complex functions, but it is especially sensitive to the curse of dimensionality.
This concept has the prerequisites:
- Know what the K nearest neighbors algorithm is.
- Be aware of the tradeoffs of KNN vs. parameteric models, including:
- KNN requires no work at training time, but lots of work at test time
- KNN can learn arbitrarily complex functions
- KNN requires storing the entire training set
- KNN is especially prone to the curse of dimensionality
- Derive the asymptotic classification error of KNN, both with K = 1 and with K increasing.
Core resources (read/watch one of the following)
→ Coursera: Machine Learning
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Supplemental resources (the following are optional, but you may find them useful)
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Section 2.5.2, "Nearest-neighbor methods"
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation