# kernel SVM

(30 minutes to learn)

## Summary

The main advantage of the SVM as a linear classifier is that it can be kernelized in order to represent complex nonlinear decision boundaries. Conveniently, since only a (hopefully) sparse subset of the training examples are used, kernels only need to be computed with a small fraction of the training examples. Kernel SVMs are one of the most widely used classifiers in machine learning, because off-the-shelf tools often perform very well.

## Context

This concept has the prerequisites:

- SVM optimality conditions (Kernelizing the SVM requires deriving the optimality conditions.)
- the kernel trick

## Core resources (read/watch one of the following)

## -Free-

→ The Elements of Statistical Learning

A graudate-level statistical learning textbook with a focus on frequentist methods.

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Coursera: Machine Learning (2013)

An online machine learning course aimed at a broad audience.

Location:
Lecture "Kernels II"

Other notes:

- Click on "Preview" to see the videos.

## -Paid-

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

Location:
Section 14.5-14.5.2.2, pages 496-502

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 7.1, up to 7.1.1, pages 326-331

## See also

- The kernel SVM can be optimized with the sequential minimal optimization (SMO) algorithm ".
- Techniques for constructing kernels
- Other examples of kernelized models include: