Markov decision process (MDP)
(50 minutes to learn)
A Markov Decision Process (MDP) is a mathematical framework for handling search/planning problems where the outcome of actions are uncertain (non-deterministic). MDPs aim to maximize the expected utility (minimize the expected loss) throughout the search/planning.
This concept has the prerequisites:
- Understand the four components that comprise an MDP
- Understand the following terminology: states, actions, rewards, transition functions/probabilities, policy, utility, discount parameter, q-states, value
- Understand the recursive definition of value
- Understand the justification for discount parameters (especially with infinite time MDPs)
Core resources (read/watch one of the following)
→ Berkeley Artificial Intelligence CS188 (2013)
- The lecture provides definitions, descriptions, and examples of MDPs -- it briefly covers value iteration (a method for finding optimal policies) near the end of the lecture
→ EdX Artificial Intelligence
Location: Week 7 - Lecture 8: Markov Decision Processes (part 1-3 and quiz 1-2)
- navigate between lecture material using the slider at the top
→ Artificial Intelligence: a Modern Approach
A textbook giving a broad overview of all of AI.
Location: Section 17.1 p. 645-651
Supplemental resources (the following are optional, but you may find them useful)
Location: Article: Markov decision process
- There are several methods for finding the optimal policy for an MDP:
- Value iteration is an application of dynamic programming that recursively computes the value function. It does not scale well as it has a complexity of O(S^2A) for S states and A actions.
- Policy iteration is similar to value iteration, but it alternates between determining the value function given a fixed policy and choosing a policy given a fixed value function. It tends to converge much quicker than value iteration.
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation