boosting as optimization

(1.4 hours to learn)


AdaBoost can be interpreted as a sequential procedure for minimizing the exponential loss on the training set with respect to the coefficients of a particular basis function expansion. This leads to generalizations of the algorithm to different loss functions.


This concept has the prerequisites:


  • Derive AdaBoost as a sequential procedure to minimize the exponential loss on the training set.
  • Based on this analysis, why might AdaBoost be especially sensitive to mislabeled training examples?
  • Understand how the basic boosting procedure can be generalized to other loss function.s
    • Why do we often re-estimate the weights of the base classifiers for more general boosting algorithms, but not for AdaBoost?

Core resources (read/watch one of the following)



See also

-No Additional Notes-