backpropagation for second-order methods
(1.6 hours to learn)
Backpropagation is normally used to propagate first-order derivatives (gradients). However, it can also be used to propagate second-order derivatives, at least approximately.
This concept has the prerequisites:
- matrix inverse (We often want to approximate the inverse Hessian rather than the Hessian itself.)
Core resources (read/watch one of the following)
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Section 5.4, pages 249-256
- Computing second derivatives explicitly can be too expensive in high dimensions. However, backpropagation can still be used with quasi-Newton methods which only require gradients.
- Some features of neural nets which make second-order training difficult:
- the objective function is not convex, so the Hessian might not be PSD
- some commonly used nonlinearities are not differentiable
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation