Jensen's inequality
Summary
Jensen's Inequality states that the expectation of a convex function is larger than the function of the expectation. It is used to prove the Rao-Blackwell theorem in statistics, and is the basis behind many algorithms for probabilistic inference, including Expectation-Maximization (EM) and variational inference.
Context
This concept has the prerequisites:
- convex functions (Jensen's inequality applies to convex functions.)
- expectation and variance (Jensen's inequality involves expectations.)
Core resources (we're sorry, we haven't finished tracking down resources for this concept yet)
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ Information Theory, Inference, and Learning Algorithms
A graudate-level textbook on machine learning and information theory.
Location:
Section 2.7, "Jensen's inequality for convex functions," pages 35-36
-Paid-
→ Elements of Information Theory
A graduate level textbook on information theory.
Location:
Section 2.6, "Jensen's inequality and its consequences," up to Theorem 2.6.2, pages 25-27
→ A First Course in Probability
An introductory probability textbook.
Location:
Section 8.5, "Other inequalities," page 453
See also
- Some uses of Jensen's inequality:
- showing that KL divergence , a measure of distance between probability distributions, is nonnegative
- showing that the EM algorithm [increases the likelihood function](expectation_maximization_variational_interpretation)
- variational Bayes , a general framework for approximate inference in probabilistic models
- the Rao-Blackwell theorem , which shows that estimators should only consider sufficient statistics is the loss function is convex