(50 minutes to learn)
Value iteration is a recursive algorithm for computing the value function, and in turn the optimal policy, for a Markov decision process.
This concept has the prerequisites:
Core resources (read/watch one of the following)
→ EdX Artificial Intelligence
Location: Week 7 - Lecture 8 parts 4-5 and quiz 3-4
- navigate between lecture material using the slider at the top
→ Berkeley Artificial Intelligence CS188 (2013)
Location: Lecture 8: Markov Decision Processes
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location: Section 17.2 p. 652-656
- The policy typically converges long before the value function, therefore policy iteration is typically used instead of value iteration to learn an optimal policy. Both policy iteration and value iteration are Bellman updates turned into an iterative algorithms, where the difference is whether you plug in a fixed policy and iteratively improve (policy iteration) or directly look for the best policy by considering all of the actions (value iteration).