sufficient statistics
(1 hours to learn)
Summary
Sufficient statistics are statistics which summarize all of the information a dataset contains about the parameters of a distribution. The Rao-Blackwell Theorem implies that statistical estimators should depend only on sufficient statistics when they exist.
Context
This concept has the prerequisites:
- random variables (Sufficient statistics are a way of analyzing probability distributions.)
- conditional distributions (Sufficient statistics are defined in terms of conditional distributions.)
Goals
- Know the definition of a sufficient statistic
- Derive an equivalent criterion in terms of a factorization of the distribution
- Prove the Rao-Blackwell Theorem, which implies that estimators should be based on sufficient statistics when the exist.
- Note: the general form of the Rao-Blackwell Theorem, which applies to convex loss functions, depends on Jensen's inequality , but many texts give the special case for squared error.
Core resources (read/watch one of the following)
-Paid-
→ Probability and Statistics
An introductory textbook on probability theory and statistics.
Location:
Section 7.7, "Sufficient statistics," pages 443-448
→ Mathematical Statistics and Data Analysis
An undergraduate statistics textbook.
Location:
Section 8.8, "Sufficiency," pages 305-310
Supplemental resources (the following are optional, but you may find them useful)
-Paid-
→ All of Statistics
A very concise introductory statistics textbook.
Location:
Section 9.13.2, "Sufficiency," pages 137-140
See also
- Exponential families are a class of distributions which can be parameterized in terms of sufficient statistics.
- The maximum likelihood estimator depends only on sufficient statistics.
- Many sufficient statistics correspond to moments of a distribution.