(50 minutes to learn)
Value iteration is a recursive algorithm for computing the value function, and in turn the optimal policy, for a Markov decision process.
This concept has the prerequisites:
- Markov decision process (MDP) (value iteration computes the value function for markov decision processes)
- Bellman equations (value iteration finds a fixed point for the MDP Bellman equations)
Core resources (read/watch one of the following)
→ EdX Artificial Intelligence
Location: Week 7 - Lecture 8 parts 4-5 and quiz 3-4
- navigate between lecture material using the slider at the top
→ Berkeley Artificial Intelligence CS188 (2013)
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location: Section 17.2 p. 652-656
- The policy typically converges long before the value function, therefore policy iteration is typically used instead of value iteration to learn an optimal policy. Both policy iteration and value iteration are Bellman updates turned into an iterative algorithms, where the difference is whether you plug in a fixed policy and iteratively improve (policy iteration) or directly look for the best policy by considering all of the actions (value iteration).
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation