policy iteration

Policy iteration is a two step iterative algorithm for computing an optimal policy for a Markov decision process. Policy iteration alternates between (i) computing the value function for a fixed policy (which could be initialized randomly) and (ii) improving the policy by selecting the actions that maximize the values computed in the previous step. Policy iteration generally converges to an optimal policy much quicker than value iteration.


