# linear regression as maximum likelihood

(45 minutes to learn)

## Summary

One way to solve a standard linear regression problem, y=w*x, is to assume the likelihood of the observed y, p(y; w*x, sigma^2) is Gaussian. This assumption means that we believe the observed values of y are a deterministic function of w*x plus some random Gaussian noise: y = w*x + e, where e is random Gaussian noise. If we assume a known sigma, the maximum likelihood estimator for w is obtained by minimizing the sum-of-squares error, Sum[(y-w*x)^2] for all y and x pairs, which has a closed form solution.

## Context

This concept has the prerequisites:

- linear regression (We're interpreting linear regression as maximum likelihood.)
- maximum likelihood (We're interpreting linear regression as maximum likelihood.)

## Core resources (read/watch one of the following)

## -Free-

→ Mathematical Monk: Machine Learning (2011)

Online videos on machine learning.

Location:
Lecture 9.4: MLE for linear regression

Other notes:

- detailed derivation of maximum likelihood estimator

→ Stanford's Machine Learning lecture notes

Lecture notes for Stanford's machine learning course, aimed at graduate and advanced undergraduate students.

Other notes:

- quick summary of maximum likelihood estimator

## Supplemental resources (the following are optional, but you may find them useful)

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Sections 3.1.1-3.1.2, pgs. 140-144

## See also

- Viewing linear regression as maximum likelihood estimation leads to a number of generalization, including: Linear regression is a kind of generalized linear model .