bagging

(1.6 hours to learn)

Summary

Bagging is a technique for reducing the variance of a learning algorithm by averaging the predictions obtained from random resamplings of the training data. It can improve the performance of unstable algorithms such as decision trees.

Context

This concept has the prerequisites:

Goals

  • Know what the bagging procedure is.
  • What is the motivation behind bagging? For what sorts of algorithms would you expect it to improve performance?

Core resources (read/watch one of the following)

-Free-

Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Author: Pedro Domingos
Other notes:
  • Click on "Preview" to see the videos.

Supplemental resources (the following are optional, but you may find them useful)

-Free-

The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Authors: Trevor Hastie,Robert Tibshirani,Jerome Friedman
Additional dependencies:
  • the bootstrap
  • bias-variance decomposition

See also

  • Random forests often achieve much better performance by introducing additional randomness.
  • Boosting is another classifier combination method which sounds similar to bagging, but is meant to improve accuracy rather than reduce variance.
  • The bootstrap is another technique based on randomly resampling a dataset.