constructing kernels
(2.2 hours to learn)
Summary
The kernel trick allows us to reformulate linear machine learning models in terms of a kernel function which defines a notion of similarity between data points. A few simple rules allow us to construct kernels which capture a wide variety of similarity functions.
Context
This concept has the prerequisites:
Core resources (read/watch one of the following)
-Free-
→ Gaussian Processes for Machine Learning
A graduate-level machine learning textbook focusing on Gaussian processes.
Location:
Sections 4-4.2, pages 79-95
Other notes:
- Don't worry about the parts about spectral density if you're not familiar with Fourier techniques. Section 4.2.4, on constructing new kernels from simpler kernels, is especially useful.
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 6.2, pages 294-299
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ Bayesian Reasoning and Machine Learning
A textbook for a graudate machine learning course.
See also
- The Schur product theorem justifies the surprising fact that the product of kernels is a kernel.
- Kernels can be defined on a variety of mathematical objects, allowing us to extend linear machine learning models to those cases: Fisher kernels are a general recipe for obtaining a kernel from a generative model.