backpropagation for second-order methods
(1.6 hours to learn)
Summary
Backpropagation is normally used to propagate first-order derivatives (gradients). However, it can also be used to propagate second-order derivatives, at least approximately.
Context
This concept has the prerequisites:
- backpropagation
- matrix inverse (We often want to approximate the inverse Hessian rather than the Hessian itself.)
Core resources (read/watch one of the following)
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location:
Section 5.4, pages 249-256
See also
- Computing second derivatives explicitly can be too expensive in high dimensions. However, backpropagation can still be used with quasi-Newton methods which only require gradients.
- Some features of neural nets which make second-order training difficult:
- the objective function is not convex, so the Hessian might not be PSD
- some commonly used nonlinearities are not differentiable