random forests
(2.6 hours to learn)
Summary
Random forests are a machine learning algorithm which averages the predictions over decision trees restricted to random subsets of the input features. They are widely used because they often perform very well with almost no parameter tuning.
Context
This concept has the prerequisites:
- decision trees (Random forests are ensembles of decision trees.)
- bagging (Bagging is a part of the random forest algorithm.)
- generalization (Averaging over random resamplings is intended to improve generalization performance.)
- expectation and variance (The variance of the individual classifiers is important for understanding random forests.)
Goals
- Know the basic random forest algorithm
- What effect does varying the number of features have? What are the advantages of larger or smaller values?
- How do you determine the relevance of each of the input features to the classification?
- How do you estimate out-of-sample error as the training is progressing?
Core resources (read/watch one of the following)
-Free-
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Supplemental resources (the following are optional, but you may find them useful)
-Free-
→ Decision forests: a unified framework for classification, regression, density estimation, manifold learning, and semi-supervised learning
→ Random forests
→ Machine learning - random forests