(2.2 hours to learn)
Backpropagation is the standard algorithm for training supervised feed-forward neural nets. More precisely, it isn't actually a learning algorithm, but a way of computing the gradient of the loss function with respect to the network parameters. Mathematically, it's just an instance of the chain rule for derivatives, but it has an intuitive interpretation in terms of passing messages between the units.
This concept has the prerequisites:
Core resources (read/watch one of the following)
→ Coursera: Machine Learning
→ Coursera: Machine Learning (2013)
An online machine learning course aimed at a broad audience.
Location: Lecture series "Neural networks: learning"
- Click on "Preview" to see the videos.
→ Coursera: Neural Networks for Machine Learning (2012)
An online course by Geoff Hinton, who invented many of the core ideas behind neural nets and deep learning.
- Lecture "Learning the weights of a logistic output neuron"
- Lecture "The backpropagation algorithm"
- Lecture "Using the derivatives computed by backpropagation"
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
- Sections 11.5-11.7 discuss practical issues and examples in training neural nets.
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Sections 5.2-5.3, pages 232-249
Supplemental resources (the following are optional, but you may find them useful)
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location: Section 20.5, subsection "Multilayer feed-forward neural networks," pages 744-748
- Since backpropagation is basically a way of computing gradients, it can be also used in quasi-Newton methods , not just gradient descent.
- Backpropagation can be used to compute second derivatives as well.
- Unfortunately, training neural nets is not a convex optimization problem , so it suffers from local optima and plateaus.
- Generative pre-training is one strategy for getting around these local optima.
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation