# backpropagation

(2.2 hours to learn)

## Summary

Backpropagation is the standard algorithm for training supervised feed-forward neural nets. More precisely, it isn't actually a learning algorithm, but a way of computing the gradient of the loss function with respect to the network parameters. Mathematically, it's just an instance of the chain rule for derivatives, but it has an intuitive interpretation in terms of passing messages between the units.

## Context

This concept has the prerequisites:

- feed-forward neural nets (Backpropagation is an algorithm for training feed-forward neural nets.)
- stochastic gradient descent (Backpropagation is a kind of gradient descent.)
- Chain Rule (Backpropagation follows from the chain rule for Jacobian matrices.)

## Core resources (read/watch one of the following)

## -Free-

→ Coursera: Machine Learning

An online machine learning course aimed at advanced undergraduates.

Other notes:

- Click on "Preview" to see the videos.

→ Coursera: Machine Learning (2013)

An online machine learning course aimed at a broad audience.

Location:
Lecture series "Neural networks: learning"

Other notes:

- Click on "Preview" to see the videos.

→ Coursera: Neural Networks for Machine Learning (2012)

An online course by Geoff Hinton, who invented many of the core ideas behind neural nets and deep learning.

- Lecture "Learning the weights of a logistic output neuron"
- Lecture "The backpropagation algorithm"
- Lecture "Using the derivatives computed by backpropagation"

→ The Elements of Statistical Learning

A graudate-level statistical learning textbook with a focus on frequentist methods.

Other notes:

- Sections 11.5-11.7 discuss practical issues and examples in training neural nets.

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Sections 5.2-5.3, pages 232-249

## Supplemental resources (the following are optional, but you may find them useful)

## -Paid-

→ Artificial Intelligence: a Modern Approach

A textbook giving a broad overview of all of AI.

Location:
Section 20.5, subsection "Multilayer feed-forward neural networks," pages 744-748

## See also

- Since backpropagation is basically a way of computing gradients, it can be also used in quasi-Newton methods , not just gradient descent.
- Backpropagation can be used to compute second derivatives as well.
- Unfortunately, training neural nets is not a convex optimization problem , so it suffers from local optima and plateaus.
- Generative pre-training is one strategy for getting around these local optima.