the support vector machine

(1.5 hours to learn)

Summary

The support vector machine (SVM) is a classification algorithm which tries to fit a hyperplane which maximizes the margin, or the smallest distance separating an example from the decision boundary. The main advantage is that SVMs can be kernelized, allowing them to represent complex nonlinear decision boundaries. Conveniently, the kernelized representation only requires explicitly computing kernels with a small fraction of the data points.

Context

This concept has the prerequisites:

Core resources (read/watch one of the following)

-Free-

Stanford's Machine Learning lecture notes
Lecture notes for Stanford's machine learning course, aimed at graduate and advanced undergraduate students.
Author: Andrew Y. Ng

-Paid-

Supplemental resources (the following are optional, but you may find them useful)

-Free-

The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Authors: Trevor Hastie,Robert Tibshirani,Jerome Friedman
Additional dependencies:
  • Langrange duality
Convex Optimization
A graduate-level textbook on convex optimization.
Authors: Stephen Boyd,Lieven Vandenberghe
Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
Author: Andrew Y. Ng
Additional dependencies:
  • logistic regression
Other notes:
  • Click on "Preview" to see the videos.

-Paid-

See also

  • If the training set is not linearly separable, the soft-margin SVM allows for some of the constraints to be violated.XS
  • The SVM can be optimized with the sequential minimal optimization (SMO) algorithm.
  • The main advantage of SVMs is that they can be kernelized in order to capture nonlinear dependencies.
  • The SVM is closely related to [logistic regression](logistic_regression) , and it is instructive to compare the loss functions.
  • The SVM can be justified as optimizing a tradeoff between bias and variance.