backpropagation for second-order methods

(1.6 hours to learn)

Summary

Backpropagation is normally used to propagate first-order derivatives (gradients). However, it can also be used to propagate second-order derivatives, at least approximately.

Context

This concept has the prerequisites:

Core resources (read/watch one of the following)

-Paid-

See also

  • Computing second derivatives explicitly can be too expensive in high dimensions. However, backpropagation can still be used with quasi-Newton methods which only require gradients.
  • Some features of neural nets which make second-order training difficult:
    • the objective function is not convex, so the Hessian might not be PSD
    • some commonly used nonlinearities are not differentiable
    Hessian-free optimization is a second-order method which has been successfully applied to training deep neural nets.