the support vector machine
(1.5 hours to learn)
The support vector machine (SVM) is a classification algorithm which tries to fit a hyperplane which maximizes the margin, or the smallest distance separating an example from the decision boundary. The main advantage is that SVMs can be kernelized, allowing them to represent complex nonlinear decision boundaries. Conveniently, the kernelized representation only requires explicitly computing kernels with a small fraction of the data points.
This concept has the prerequisites:
Core resources (read/watch one of the following)
→ Stanford's Machine Learning lecture notes
Lecture notes for Stanford's machine learning course, aimed at graduate and advanced undergraduate students.
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Section 7.1, up to 7.1.1 (pages 326-331)
- Langrange duality
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location: Section 14.5-184.108.40.206, pages 496-502
Supplemental resources (the following are optional, but you may find them useful)
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
- Langrange duality
→ Convex Optimization
A graduate-level textbook on convex optimization.
→ Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
- logistic regression
- Click on "Preview" to see the videos.
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location: Section 20.6, pages 749-752
- If the training set is not linearly separable, the soft-margin SVM allows for some of the constraints to be violated.XS
- The SVM can be optimized with the sequential minimal optimization (SMO) algorithm.
- The main advantage of SVMs is that they can be kernelized in order to capture nonlinear dependencies.
- The SVM is closely related to [logistic regression](logistic_regression) , and it is instructive to compare the loss functions.
- The SVM can be justified as optimizing a tradeoff between bias and variance.
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation