policy iteration
(50 minutes to learn)
Summary
Policy iteration is a two step iterative algorithm for computing an optimal policy for a Markov decision process. Policy iteration alternates between (i) computing the value function for a fixed policy (which could be initialized randomly) and (ii) improving the policy by selecting the actions that maximize the values computed in the previous step. Policy iteration generally converges to an optimal policy much quicker than value iteration.
Context
This concept has the prerequisites:
- value iteration (policy iteration can be viewed as value iteration in which we select the last action that optimized for the given state, rather than always considering every action)
- Bellman equations (policy iteration uses the Bellman equations for MDPs)
Core resources (read/watch one of the following)
-Free-
→ EdX Artificial Intelligence
Location:
Week 7 Lecture 9: Markov Decision Processes II
Other notes:
- navigate between lecture material using the slider at the top
→ Berkeley Artificial Intelligence CS188 (2013)
-Paid-
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location:
Section 17.3 p. 656-658
See also
-No Additional Notes-