# Jensen's inequality

## Summary

Jensen's Inequality states that the expectation of a convex function is larger than the function of the expectation. It is used to prove the Rao-Blackwell theorem in statistics, and is the basis behind many algorithms for probabilistic inference, including Expectation-Maximization (EM) and variational inference.

## Context

This concept has the prerequisites:

- convex functions (Jensen's inequality applies to convex functions.)
- expectation and variance (Jensen's inequality involves expectations.)

## Core resources (we're sorry, we haven't finished tracking down resources for this concept yet)

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Information Theory, Inference, and Learning Algorithms

A graudate-level textbook on machine learning and information theory.

Location:
Section 2.7, "Jensen's inequality for convex functions," pages 35-36

## -Paid-

→ Elements of Information Theory

A graduate level textbook on information theory.

Location:
Section 2.6, "Jensen's inequality and its consequences," up to Theorem 2.6.2, pages 25-27

→ A First Course in Probability

An introductory probability textbook.

Location:
Section 8.5, "Other inequalities," page 453

## See also

- Some uses of Jensen's inequality:
- showing that KL divergence , a measure of distance between probability distributions, is nonnegative
- showing that the EM algorithm [increases the likelihood function](expectation_maximization_variational_interpretation)
- variational Bayes , a general framework for approximate inference in probabilistic models
- the Rao-Blackwell theorem , which shows that estimators should only consider sufficient statistics is the loss function is convex