# boosting as optimization

(1.4 hours to learn)

## Summary

AdaBoost can be interpreted as a sequential procedure for minimizing the exponential loss on the training set with respect to the coefficients of a particular basis function expansion. This leads to generalizations of the algorithm to different loss functions.

## Context

This concept has the prerequisites:

- AdaBoost
- optimization problems
- basis function expansions (The optimization is with respect to the weights in a particular basis function expansion.)

## Goals

- Derive AdaBoost as a sequential procedure to minimize the exponential loss on the training set.

- Based on this analysis, why might AdaBoost be especially sensitive to mislabeled training examples?

- Understand how the basic boosting procedure can be generalized to other loss function.s
- Why do we often re-estimate the weights of the base classifiers for more general boosting algorithms, but not for AdaBoost?

## Core resources (read/watch one of the following)

## -Free-

→ The Elements of Statistical Learning

A graudate-level statistical learning textbook with a focus on frequentist methods.

- Section 10.2, "Boosting fits an additive model," pages 341-342
- Section 10.3, "Forward stagewise additive modeling," pages 342-343
- Section 10.4, "Exponential loss and AdaBoost," pages 343-344
- Section 10.5, "Why exponential loss?", pages 345-346
- Section 10.6, "Loss functions and robustness," pages 346-350

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

- Section 14.3.1, "Minimizing exponential error," pages 659-661
- Section 14.3.2, "Error functions for boosting," pages 661-663

## See also

-No Additional Notes-