boosting as optimization
(1.4 hours to learn)
Summary
AdaBoost can be interpreted as a sequential procedure for minimizing the exponential loss on the training set with respect to the coefficients of a particular basis function expansion. This leads to generalizations of the algorithm to different loss functions.
Context
This concept has the prerequisites:
- AdaBoost
- optimization problems
- basis function expansions (The optimization is with respect to the weights in a particular basis function expansion.)
Goals
- Derive AdaBoost as a sequential procedure to minimize the exponential loss on the training set.
- Based on this analysis, why might AdaBoost be especially sensitive to mislabeled training examples?
- Understand how the basic boosting procedure can be generalized to other loss function.s
- Why do we often re-estimate the weights of the base classifiers for more general boosting algorithms, but not for AdaBoost?
Core resources (read/watch one of the following)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
- Section 10.2, "Boosting fits an additive model," pages 341-342
- Section 10.3, "Forward stagewise additive modeling," pages 342-343
- Section 10.4, "Exponential loss and AdaBoost," pages 343-344
- Section 10.5, "Why exponential loss?", pages 345-346
- Section 10.6, "Loss functions and robustness," pages 346-350
-Paid-
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
- Section 14.3.1, "Minimizing exponential error," pages 659-661
- Section 14.3.2, "Error functions for boosting," pages 661-663
See also
-No Additional Notes-