kernel SVM
(30 minutes to learn)
Summary
The main advantage of the SVM as a linear classifier is that it can be kernelized in order to represent complex nonlinear decision boundaries. Conveniently, since only a (hopefully) sparse subset of the training examples are used, kernels only need to be computed with a small fraction of the training examples. Kernel SVMs are one of the most widely used classifiers in machine learning, because off-the-shelf tools often perform very well.
Context
This concept has the prerequisites:
- SVM optimality conditions (Kernelizing the SVM requires deriving the optimality conditions.)
- the kernel trick
Core resources (read/watch one of the following)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
Location:
Lecture "Kernels II"
Other notes:
- Click on "Preview" to see the videos.
-Paid-
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Section 14.5-14.5.2.2, pages 496-502
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 7.1, up to 7.1.1, pages 326-331
See also
- The kernel SVM can be optimized with the sequential minimal optimization (SMO) algorithm ".
- Techniques for constructing kernels
- Other examples of kernelized models include: