boosting as optimization
(1.4 hours to learn)
AdaBoost can be interpreted as a sequential procedure for minimizing the exponential loss on the training set with respect to the coefficients of a particular basis function expansion. This leads to generalizations of the algorithm to different loss functions.
This concept has the prerequisites:
- Derive AdaBoost as a sequential procedure to minimize the exponential loss on the training set.
- Based on this analysis, why might AdaBoost be especially sensitive to mislabeled training examples?
- Understand how the basic boosting procedure can be generalized to other loss function.s
- Why do we often re-estimate the weights of the base classifiers for more general boosting algorithms, but not for AdaBoost?
Core resources (read/watch one of the following)
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
- Section 10.2, "Boosting fits an additive model," pages 341-342
- Section 10.3, "Forward stagewise additive modeling," pages 342-343
- Section 10.4, "Exponential loss and AdaBoost," pages 343-344
- Section 10.5, "Why exponential loss?", pages 345-346
- Section 10.6, "Loss functions and robustness," pages 346-350
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
- Section 14.3.1, "Minimizing exponential error," pages 659-661
- Section 14.3.2, "Error functions for boosting," pages 661-663
-No Additional Notes-
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation