bagging
(1.6 hours to learn)
Summary
Bagging is a technique for reducing the variance of a learning algorithm by averaging the predictions obtained from random resamplings of the training data. It can improve the performance of unstable algorithms such as decision trees.
Context
This concept has the prerequisites:
- decision trees (Decision trees are a major motivating example for bagging.)
- generalization (Bagging is a method for improving generalization.)
Goals
- Know what the bagging procedure is.
- What is the motivation behind bagging? For what sorts of algorithms would you expect it to improve performance?
Core resources (read/watch one of the following)
-Free-
→ Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Other notes:
- Click on "Preview" to see the videos.
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Location:
Section 8.7, "Bagging," pages 282-288
Additional dependencies:
- the bootstrap
- bias-variance decomposition
See also
- Random forests often achieve much better performance by introducing additional randomness.
- Boosting is another classifier combination method which sounds similar to bagging, but is meant to improve accuracy rather than reduce variance.
- The bootstrap is another technique based on randomly resampling a dataset.