random forests

(2.6 hours to learn)

Summary

Random forests are a machine learning algorithm which averages the predictions over decision trees restricted to random subsets of the input features. They are widely used because they often perform very well with almost no parameter tuning.

Context

This concept has the prerequisites:

Goals

  • Know the basic random forest algorithm
  • What effect does varying the number of features have? What are the advantages of larger or smaller values?
  • How do you determine the relevance of each of the input features to the classification?
  • How do you estimate out-of-sample error as the training is progressing?

Core resources (read/watch one of the following)

-Free-

The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Authors: Trevor Hastie,Robert Tibshirani,Jerome Friedman

Supplemental resources (the following are optional, but you may find them useful)

-Free-

Decision forests: a unified framework for classification, regression, density estimation, manifold learning, and semi-supervised learning
Authors: Antonio Criminisi,Jamie Shotton,Ender Konukoglu
Machine learning - random forests
Author: Nando de Freitas

See also

  • Boosting is another ensemble method commonly applied to decision trees.
  • Dropout is a regularization technique for neural networks inspired by random forests.