constructing kernels

(2.2 hours to learn)


The kernel trick allows us to reformulate linear machine learning models in terms of a kernel function which defines a notion of similarity between data points. A few simple rules allow us to construct kernels which capture a wide variety of similarity functions.


This concept has the prerequisites:

Core resources (read/watch one of the following)


Gaussian Processes for Machine Learning
A graduate-level machine learning textbook focusing on Gaussian processes.
Authors: Carl E. Rasmussen,Christopher K. I. Williams
Other notes:
  • Don't worry about the parts about spectral density if you're not familiar with Fourier techniques. Section 4.2.4, on constructing new kernels from simpler kernels, is especially useful.


Supplemental resources (the following are optional, but you may find them useful)


Bayesian Reasoning and Machine Learning
A textbook for a graudate machine learning course.
Author: David Barber

See also

  • The Schur product theorem justifies the surprising fact that the product of kernels is a kernel.
  • Kernels can be defined on a variety of mathematical objects, allowing us to extend linear machine learning models to those cases: Fisher kernels are a general recipe for obtaining a kernel from a generative model.