backpropagation for second-order methods
(1.6 hours to learn)
Backpropagation is normally used to propagate first-order derivatives (gradients). However, it can also be used to propagate second-order derivatives, at least approximately.
This concept has the prerequisites:
Core resources (read/watch one of the following)
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Section 5.4, pages 249-256
- Computing second derivatives explicitly can be too expensive in high dimensions. However, backpropagation can still be used with quasi-Newton methods which only require gradients.
- Some features of neural nets which make second-order training difficult:
- the objective function is not convex, so the Hessian might not be PSD
- some commonly used nonlinearities are not differentiable