value iteration
(50 minutes to learn)
Summary
Value iteration is a recursive algorithm for computing the value function, and in turn the optimal policy, for a Markov decision process.
Context
This concept has the prerequisites:
- Markov decision process (MDP) (value iteration computes the value function for markov decision processes)
- Bellman equations (value iteration finds a fixed point for the MDP Bellman equations)
Core resources (read/watch one of the following)
-Free-
→ EdX Artificial Intelligence
Location:
Week 7 - Lecture 8 parts 4-5 and quiz 3-4
Other notes:
- navigate between lecture material using the slider at the top
→ Berkeley Artificial Intelligence CS188 (2013)
-Paid-
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location:
Section 17.2 p. 652-656
See also
- The policy typically converges long before the value function, therefore policy iteration is typically used instead of value iteration to learn an optimal policy. Both policy iteration and value iteration are Bellman updates turned into an iterative algorithms, where the difference is whether you plug in a fixed policy and iteratively improve (policy iteration) or directly look for the best policy by considering all of the actions (value iteration).