# sufficient statistics

(1 hours to learn)

## Summary

Sufficient statistics are statistics which summarize all of the information a dataset contains about the parameters of a distribution. The Rao-Blackwell Theorem implies that statistical estimators should depend only on sufficient statistics when they exist.

## Context

This concept has the prerequisites:

- random variables (Sufficient statistics are a way of analyzing probability distributions.)
- conditional distributions (Sufficient statistics are defined in terms of conditional distributions.)

## Goals

- Know the definition of a sufficient statistic

- Derive an equivalent criterion in terms of a factorization of the distribution

- Prove the Rao-Blackwell Theorem, which implies that estimators should be based on sufficient statistics when the exist.
- Note: the general form of the Rao-Blackwell Theorem, which applies to convex loss functions, depends on Jensen's inequality , but many texts give the special case for squared error.

## Core resources (read/watch one of the following)

## -Paid-

→ Probability and Statistics

An introductory textbook on probability theory and statistics.

Location:
Section 7.7, "Sufficient statistics," pages 443-448

→ Mathematical Statistics and Data Analysis

An undergraduate statistics textbook.

Location:
Section 8.8, "Sufficiency," pages 305-310

## Supplemental resources (the following are optional, but you may find them useful)

## -Paid-

→ All of Statistics

A very concise introductory statistics textbook.

Location:
Section 9.13.2, "Sufficiency," pages 137-140

## See also

- Exponential families are a class of distributions which can be parameterized in terms of sufficient statistics.
- The maximum likelihood estimator depends only on sufficient statistics.
- Many sufficient statistics correspond to moments of a distribution.