# backpropagation for second-order methods

(1.6 hours to learn)

## Summary

Backpropagation is normally used to propagate first-order derivatives (gradients). However, it can also be used to propagate second-order derivatives, at least approximately.

## Context

This concept has the prerequisites:

- backpropagation
- matrix inverse (We often want to approximate the inverse Hessian rather than the Hessian itself.)

## Core resources (read/watch one of the following)

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 5.4, pages 249-256

## See also

- Computing second derivatives explicitly can be too expensive in high dimensions. However, backpropagation can still be used with quasi-Newton methods which only require gradients.
- Some features of neural nets which make second-order training difficult:
- the objective function is not convex, so the Hessian might not be PSD
- some commonly used nonlinearities are not differentiable