
Bayesian machine learning
author: Roger Grosse
An overview of Bayesian machine learning techniques.

Differential geometry for machine learning
author: Roger Grosse
An overview of several uses of differential geometry ideas in machine learning

Coursera: Machine Learning
author: Colorado Reed
A supplement to Andrew Ng's Coursera machine learning course

Dynamical Systems for Machine Learning
author: Daniel Jiwoong Im
An overview of taking dynamical systems approaches to learning.

Stanford CS229: Machine Learning
author: Roger Grosse
CS229 is Stanford's graduate course in machine learning, currently taught by Andrew Ng. It provides an overview of techniques for supervised, unsupervised, and reinforcement learning, as well as some results from computational learning theory.

LevelUp Your Machine Learning
author: Colorado Reed
This roadmap provides an answer to the question: How can I "get better" at machine learning when I don't know what I should study?

Deep learning from the bottom up
author: Roger Grosse
An overview of key ideas and recent advances in deep learning.

K nearest neighbors
K nearest neighbors is a very simple machine learning algorithm which simply averages the labels of the K nearest neighbors in the training set. It is a canonical example of a nonparametric learning algorithm. It has the advantage that it can learn arbitrarily complex functions, but it is especially sensitive to the curse of dimensionality. 
restricted Boltzmann machines
Restricted Boltzmann machines (RBMs) are a type of undirected graphical model typically used for learning binary feature representations. The structure consists of a bipartite graph with a layer of visible units to represent the inputs and a layer of hidden units to represent more abstract features. Training is intractable, but approximations such as contrastive divergence work well in practice. RBMs are a building block of many models in deep learning. 
multiple integrals
A multiple integral generalizes integration to functions of n variables and produces a general (n1)dimensional volume. For instance, n=2 corresponds to an area. Multiple integrals occur frequently in probability theory and machine learning when examining marginal densities. 
early stopping
Early stopping is a technique for controlling overfitting in machine learning models, especially neural networks, by stopping training before the weights have converged. Often we stop when the performance has stopped improving on a heldout validation set. 
Bayesian parameter estimation: Gaussian distribution
Using the Bayesian framework, we can infer the mean parameter of a Gaussian distribution, the scale parameter, or both. Since Gaussians are widely used in probabilistic modeling, the computations that go into this are common motifs in Bayesian machine learning more generally. 
loss function
A loss function or cost function is a function that maps the outcome of a decision to a realvalued cost associated with that outcome. Loss functions are common in machine learning, information theory, statistics, and mathematical optimization, and help guide decision making under uncertainty. 
random forests
Random forests are a machine learning algorithm which averages the predictions over decision trees restricted to random subsets of the input features. They are widely used because they often perform very well with almost no parameter tuning. 
constructing kernels
The kernel trick allows us to reformulate linear machine learning models in terms of a kernel function which defines a notion of similarity between data points. A few simple rules allow us to construct kernels which capture a wide variety of similarity functions. 
covariance matrices
A covariance matrix generalizes the idea of variance to multiple dimensions, where the ith jth element in the covariance matrix is the covariance between the ith and jth random variables. Covariance matrices are common throughout both statistics and machine learning and often arise when dealing with multivariate distributions. 
LASSO
The Lasso is a form of regularized linear regression. Unlike ridge regression, it puts an L1 penalty on the weights, which encourages sparsity, i.e. it encourages most of the weights to be exactly zero. The general trick of using L1 norms to encourage sparsity is widely used in machine learning. 
logistic regression
Logistic regression is a machine learning model for binary classification, i.e. learning to classify data points into one of two categories. It's a linear model, in that the decision depends only on the dot product of a weight vector with a feature vector. This means the classification boundary can be represented as a hyperplane. It's a widely used model in its own right, and the general structure of linearfollowedbysigmoid is a common motif in neural networks. 
Bayesian parameter estimation: multivariate Gaussians
Using the Bayesian framework, we can infer the posterior over the mean vector of a multivariate Gaussian, the covariance matrix, or both. Since multivariate Gaussians are widely used in probabilistic modeling, the computations that go into this are common motifs in Bayesian machine learning more generally. 
decision trees
Decision trees are a kind of treestructured model used in machine learning and data mining. Each leaf node corresponds to a prediction, and each internal node divides the data points into two or more sets depending on the value of one of the input variables. Decision trees are widely used because of their simplicity and their ability to handle heterogeneous input features.