# k-means

(1.5 hours to learn)

## Summary

K-means is a clustering algorithm, i.e. a way of partitioning a set of data points into "clusters," or sets of data points which are similar to one another. It works by iteratively reassigning data points to clusters and computing cluster centers based on the average of the point locations. It is commonly used for vector quantization and as an initialization for Gaussian mixture models.

## Context

-this concept has no prerequisites-

## Goals

- What objective function does the k-means algorithm minimize?

- What happens during the two stages of each k-means iteration?

- Is it possible that the k-means algorithm increases the objective function in a full iteration?

- What are some scenarios in which k-means gets stuck in poor local optima?
- What are some general techniques to overcome these scenarios?

- What are some techniques for choosing the initial points of the clusters?

- What are some techniques for choosing the number of clusters?

## Core resources (read/watch one of the following)

## -Free-

→ Information Theory, Inference, and Learning Algorithms

A graudate-level textbook on machine learning and information theory.

→ Coursera: Machine Learning (2013)

An online machine learning course aimed at a broad audience.

Location:
Lecture sequence "Clustering"

Other notes:

- Click on "Preview" to see the videos.

→ K-Means Clustering and Related Algorithms

## -Paid-

→ Pattern Recognition and Machine Learning

A textbook for a graduate machine learning course, with a focus on Bayesian methods.

Location:
Section 9.1, pages 424-430

## Supplemental resources (the following are optional, but you may find them useful)

## -Free-

→ Visualizing K-Means Clustering

Other notes:

- provides an interactive visualization of the k-means algorithm for various datasets

→ StackOverflow Question: Kmeans without knowing the number of clusters?

Other notes:

- the top answer provides a good set of links that dig into the question "how many clusters should I use?" (NB: this question is a Pandora's box)

## -Paid-

→ Machine Learning: a Probabilistic Perspective

A very comprehensive graudate-level machine learning textbook.

Location:
Sections 11.4.2.5-11.4.2.7, pages 352-355

Additional dependencies:

- mixture of Gaussians models

## See also

- using the k-means++ algorithm to select the initial cluster centers provides an optimality guarantee on the final objective as well as better empirical performance over randomly selecting initial cluster centers
- Other clustering methods include:
- mixture of Gaussians , a probabilistic model for continuous data
- mixture of Bernoullis , a probabilistic model for discrete data
- spectral clustering , for when the clusters don't have nice convex shapes

- K-means is used in information theory for quantizing continuous signals; there it is known as vector quantization .
- K-means is often used as an initialization for more sophisticated models such as mixture of Gaussians.
- K-means is analogous to the EM algorithm for fitting Gaussian mixtures