stochastic gradient descent
(1.5 hours to learn)
Stochastic gradient descent (SGD) is an iterative optimization algorithm that can be applied to functions that are a linear combination of differentiable functions. These types of functions often arise when the full objective function is a linear combination of objective functions at each data point, e.g. a least squares objective function. While batch gradient descent uses the full gradient of the function, SGD approximates the full gradient by using the gradient at each of the functions in the linear combination, e.g. the gradient of the objective function at each data point. SGD is often used to optimize non-convex functions, e.g. those that arise in neural networks.
This concept has the prerequisites:
- Understand the difference between stochastic gradient descent and batch gradient descent.
- When does stochastic gradient descent provide a reasonable approximation to the full gradient?
- What is the interpretation of the learning rate for stochastic gradient descent compared to full gradient descent?
Core resources (read/watch one of the following)
→ Coursera: Machine Learning (2013)
Supplemental resources (the following are optional, but you may find them useful)
- Natural gradient allows us to speed up stochastic gradient descent by accounting for the curvature of the objective function.
- Second-order optimization methods more generally often converge faster than first order methods, though they are harder to do in a stochastic framework.
- In large-scale machine learning , stochastic gradient descent achieves a good tradeoff between statistical error and optimization error.
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation