the support vector machine
(1.5 hours to learn)
Summary
The support vector machine (SVM) is a classification algorithm which tries to fit a hyperplane which maximizes the margin, or the smallest distance separating an example from the decision boundary. The main advantage is that SVMs can be kernelized, allowing them to represent complex nonlinear decision boundaries. Conveniently, the kernelized representation only requires explicitly computing kernels with a small fraction of the data points.
Context
This concept has the prerequisites:
- binary linear classifiers (The SVM is a kind of binary linear classifier.)
- convex optimization (The SVM is formulated as a convex optimization problem.)
Core resources (read/watch one of the following)
-Free-
→ Stanford's Machine Learning lecture notes
Lecture notes for Stanford's machine learning course, aimed at graduate and advanced undergraduate students.
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 7.1, up to 7.1.1 (pages 326-331)
Additional dependencies:
- Langrange duality
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location:
Section 14.5-14.5.2.2, pages 496-502
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Additional dependencies:
- Langrange duality
→ Convex Optimization
A graduate-level textbook on convex optimization.
→ Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
Additional dependencies:
- logistic regression
Other notes:
- Click on "Preview" to see the videos.
-Paid-
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location:
Section 20.6, pages 749-752
See also
- If the training set is not linearly separable, the soft-margin SVM allows for some of the constraints to be violated.XS
- The SVM can be optimized with the sequential minimal optimization (SMO) algorithm.
- The main advantage of SVMs is that they can be kernelized in order to capture nonlinear dependencies.
- The SVM is closely related to [logistic regression](logistic_regression) , and it is instructive to compare the loss functions.
- The SVM can be justified as optimizing a tradeoff between bias and variance.