the kernel trick
(50 minutes to learn)
Summary
We can use linear models to model complex nonlinear functions by mapping the original data to a basis function representation. Such a representation can get unweildy, however. The kernel trick allows us to implicitly map the data to a very high (possibly infinite) dimensional space by replacing the dot product with a more general inner product, or kernel.
Context
This concept has the prerequisites:
- basis function expansions (The kernel trick is a way of compactly representing very high-dimensional basis function expansions.)
- positive definite matrices (The definition of a kernel involves PSD matrices.)
- ridge regression (Kernel ridge regression is the standard example used to introduce kernels.)
Core resources (read/watch one of the following)
-Free-
→ Gaussian Processes for Machine Learning
A graduate-level machine learning textbook focusing on Gaussian processes.
Location:
Section 2.1, pages 7-12
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Sections 6-6.1, pages 291-294
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
Location:
Lecture "Kernels I"
Other notes:
- Click on "Preview" to see the videos.
→ Bayesian Reasoning and Machine Learning
A textbook for a graudate machine learning course.
See also
- Techniques for constructing kernels
- The kernel trick is used in many machine learning algorithms, including: The theory of reproducing kernel Hilbert spaces justifies the use of kernelized representations.