(2.6 hours to learn)
Random forests are a machine learning algorithm which averages the predictions over decision trees restricted to random subsets of the input features. They are widely used because they often perform very well with almost no parameter tuning.
This concept has the prerequisites:
- decision trees (Random forests are ensembles of decision trees.)
- bagging (Bagging is a part of the random forest algorithm.)
- generalization (Averaging over random resamplings is intended to improve generalization performance.)
- expectation and variance (The variance of the individual classifiers is important for understanding random forests.)
- Know the basic random forest algorithm
- What effect does varying the number of features have? What are the advantages of larger or smaller values?
- How do you determine the relevance of each of the input features to the classification?
- How do you estimate out-of-sample error as the training is progressing?
Core resources (read/watch one of the following)
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
Supplemental resources (the following are optional, but you may find them useful)
→ Decision forests: a unified framework for classification, regression, density estimation, manifold learning, and semi-supervised learning
→ Random forests
→ Machine learning - random forests
- create concept: shift + click on graph
- change concept title: shift + click on existing concept
- link together concepts: shift + click drag from one concept to another
- remove concept from graph: click on concept then press delete/backspace
- add associated content to concept: click the small circle that appears on the node when hovering over it
- other actions: use the icons in the upper right corner to optimize the graph placement, preview the graph, or download a json representation