the bootstrap

(4.8 hours to learn)


The bootstrap is a Monte Carlo technique for estimating variances or confidence intervals of statistical estimators. It uses the empirical distribution as a proxy for the true distribution, and measures the accuracy of the estimator on datasets resampled from the empirical distribution. It is widely applicable and doesn't require assuming a parametric form for the true distribution.


This concept has the prerequisites:


  • Know the procedures for both the parametric and nonparametric bootstrap
    • When would you choose one over the other?
    • Note: for the parametric bootstrap, it may help to know about a point estimator such as maximum likelihood , but you can treat this as a black box.
  • Be able to use the bootstrap to:
    • estimate the variance of an estimator
    • compute a confidence interval for an estimator
  • The nonparametric bootstrap introduces two sources of error: using the empirical distribution as a proxy for the true distribution, and repeatedly simulating from the empirical distribution. Which of these would you expect to be a larger source of error?

Core resources (read/watch one of the following)


CMU 36-402, Advanced data analysis: the bootstrap
  • Section 1, "Stochastic models, uncertainty, sampling distributions," pages 2-4
  • Section 2, "The bootstrap principle," pages 4-15
  • Section 3, "Non-parametric bootstrapping," pages 15-18
Author: Cosma Shalizi


Supplemental resources (the following are optional, but you may find them useful)


See also

-No Additional Notes-