(1.8 hours to learn)
The Lasso is a form of regularized linear regression. Unlike ridge regression, it puts an L1 penalty on the weights, which encourages sparsity, i.e. it encourages most of the weights to be exactly zero. The general trick of using L1 norms to encourage sparsity is widely used in machine learning.
This concept has the prerequisites:
Core resources (read/watch one of the following)
→ The Elements of Statistical Learning
A graudate-level statistical learning textbook with a focus on frequentist methods.
→ Machine Learning: a Probabilistic Perspective
A very comprehensive graudate-level machine learning textbook.
Location: Sections 13.3-13.3.4, pgs. 429-438
Supplemental resources (the following are optional, but you may find them useful)
→ Pattern Recognition and Machine Learning
A textbook for a graduate machine learning course, with a focus on Bayesian methods.
Location: Section 3.1.4, pgs. 144-146
- Ridge regression is another regularized version of linear regression, using an L2 penalty instead of L1.
- The LASSO encourages sparsity of the weight vector. If we believe certain features are likely to be important as a group, we can use group sparsity instead.
- Some algorithms for optimizing the LASSO objective include:
- stochastic gradient descent
- least angle regression (LARS)
- Fast Iterative Shrinkage-Thresholding Algorithm (FISTA)