# the kernel trick

(50 minutes to learn)

## Summary

We can use linear models to model complex nonlinear functions by mapping the original data to a basis function representation. Such a representation can get unweildy, however. The kernel trick allows us to implicitly map the data to a very high (possibly infinite) dimensional space by replacing the dot product with a more general inner product, or kernel.

## Context

This concept has the prerequisites:

- basis function expansions (The kernel trick is a way of compactly representing very high-dimensional basis function expansions.)
- positive definite matrices (The definition of a kernel involves PSD matrices.)
- ridge regression (Kernel ridge regression is the standard example used to introduce kernels.)

## Core resources (read/watch one of the following)

## -Free-

→ Gaussian Processes for Machine Learning

A graduate-level machine learning textbook focusing on Gaussian processes.

Location:
Section 2.1, pages 7-12

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Sections 6-6.1, pages 291-294

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Coursera: Machine Learning (2013)

An online machine learning course aimed at a broad audience.

Location:
Lecture "Kernels I"

Other notes:

- Click on "Preview" to see the videos.

→ Bayesian Reasoning and Machine Learning

A textbook for a graudate machine learning course.

## See also

- Techniques for constructing kernels
- The kernel trick is used in many machine learning algorithms, including: The theory of reproducing kernel Hilbert spaces justifies the use of kernelized representations.