# Markov decision process (MDP)

(50 minutes to learn)

## Summary

A Markov Decision Process (MDP) is a mathematical framework for handling search/planning problems where the outcome of actions are uncertain (non-deterministic). MDPs aim to maximize the expected utility (minimize the expected loss) throughout the search/planning.

## Context

This concept has the prerequisites:

## Goals

• Understand the four components that comprise an MDP
• Understand the following terminology: states, actions, rewards, transition functions/probabilities, policy, utility, discount parameter, q-states, value
• Understand the recursive definition of value
• Understand the justification for discount parameters (especially with infinite time MDPs)

## -Free-

Berkeley Artificial Intelligence CS188 (2013)
Author: Pieter Abbeel
Other notes:
• The lecture provides definitions, descriptions, and examples of MDPs -- it briefly covers value iteration (a method for finding optimal policies) near the end of the lecture
EdX Artificial Intelligence
Location: Week 7 - Lecture 8: Markov Decision Processes (part 1-3 and quiz 1-2)
Authors: Pieter Abbeel,Dan Klein
Other notes:
• navigate between lecture material using the slider at the top

Wikipedia
Location:

## See also

• There are several methods for finding the optimal policy for an MDP:
• Value iteration is an application of dynamic programming that recursively computes the value function. It does not scale well as it has a complexity of O(S^2A) for S states and A actions.
• Policy iteration is similar to value iteration, but it alternates between determining the value function given a fixed policy and choosing a policy given a fixed value function. It tends to converge much quicker than value iteration.
The Bellman equations characterize the optimal values for MDPs and are ubiquitous in reinforcement learning.