policy iteration

(50 minutes to learn)


Policy iteration is a two step iterative algorithm for computing an optimal policy for a Markov decision process. Policy iteration alternates between (i) computing the value function for a fixed policy (which could be initialized randomly) and (ii) improving the policy by selecting the actions that maximize the values computed in the previous step. Policy iteration generally converges to an optimal policy much quicker than value iteration.


This concept has the prerequisites:

Core resources (read/watch one of the following)


EdX Artificial Intelligence
Location: Week 7 Lecture 9: Markov Decision Processes II
Authors: Pieter Abbeel,Dan Klein
Other notes:
  • navigate between lecture material using the slider at the top
Berkeley Artificial Intelligence CS188 (2013)


See also

-No Additional Notes-