backpropagation for second-order methods

Backpropagation is normally used to propagate first-order derivatives (gradients). However, it can also be used to propagate second-order derivatives, at least approximately.


  • Computing second derivatives explicitly can be too expensive in high dimensions. However, backpropagation can still be used with quasi-Newton methods which only require gradients.
  • Some features of neural nets which make second-order training difficult:
    • the objective function is not convex, so the Hessian might not be PSD
    • some commonly used nonlinearities are not differentiable
    Hessian-free optimization is a second-order method which has been successfully applied to training deep neural nets.