# gradient descent

(50 minutes to learn)

## Summary

Gradient descent, also known as steepest descent, is an iterative optimization algorithm for finding a local minimum of differentiable functions. At each iteration, gradient descent operates by moving the current solution in the direction of the negative gradient of the function (the direction of "steepest descent").

## Context

This concept has the prerequisites:

- gradient
- functions of several variables (gradient descent is typically applied to functions of several variables)

## Goals

- Be able to apply gradient descent to functions of several variables

- Why is gradient descent not guaranteed to find the global optimum?

- Why is gradient descent guaranteed to converge? What can we say about the solution it obtains?

## Core resources (read/watch one of the following)

## -Free-

→ Convex Optimization

→ Coursera: Machine Learning (2013)

An online machine learning course aimed at a broad audience.

Other notes:

- Click on "Preview" to see the videos.

→ Wikipedia

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Coursera: Machine Learning

An online machine learning course aimed at advanced undergraduates.

Additional dependencies:

- perceptron algorithm

Other notes:

- Click on "Preview" to see the videos.

→ Bayesian Reasoning and Machine Learning

## See also

- stochastic gradient descent is an extension of gradient descent that can typically be applied to much larger data sets
- Newton's method is a common alternative to gradient descent