value iteration

(50 minutes to learn)

Summary

Value iteration is a recursive algorithm for computing the value function, and in turn the optimal policy, for a Markov decision process.

Context

This concept has the prerequisites:

Core resources (read/watch one of the following)

-Free-

EdX Artificial Intelligence
Location: Week 7 - Lecture 8 parts 4-5 and quiz 3-4
Authors: Pieter Abbeel,Dan Klein
Other notes:
  • navigate between lecture material using the slider at the top
Berkeley Artificial Intelligence CS188 (2013)

-Paid-

See also

  • The policy typically converges long before the value function, therefore policy iteration is typically used instead of value iteration to learn an optimal policy. Both policy iteration and value iteration are Bellman updates turned into an iterative algorithms, where the difference is whether you plug in a fixed policy and iteratively improve (policy iteration) or directly look for the best policy by considering all of the actions (value iteration).