K-Means#

K-Means is a non-probabilistic clustering algorithm that is used to group data points into a specified number of clusters. However, it can be treated as EM algorithm with a single Gaussian distribution.

Further Readings#

Books and Lectures#

Murphy, Kevin P. “Chapter 21.3. K-Means Clustering.” In Probabilistic Machine Learning: An Introduction. MIT Press, 2022.
Hal Daumé III. “Chapter 3.4. K-Means Clustering.” In A Course in Machine Learning, January 2017.
Hal Daumé III. “Chapter 15.1. K-Means Clustering.” In A Course in Machine Learning, January 2017.
Bishop, Christopher M. “Chapter 9.1. K-Means Clustering.” In Pattern Recognition and Machine Learning. New York: Springer-Verlag, 2016.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. “Chapter 12.4.1. K-Means Clustering.” In An Introduction to Statistical Learning: With Applications in R. Boston: Springer, 2022.
Raschka, Sebastian. “Chapter 10.1. Grouping objects by similarity using k-means.” In Machine Learning with PyTorch and Scikit-Learn.
Jung, Alexander. “Chapter 8.1. Hard Clustering with K-Means.” In Machine Learning: The Basics. Singapore: Springer Nature Singapore, 2023.
Vincent, Tan. “Lecture 17a.” In MA4270 Data Modelling and Computation.
CIS 520: Clustering
STAT508: Lesson 12. Cluster Analysis

Notebooks#

These two are very good resources for many other topics as well.

Online Resources#

Assumptions of K-Means:
- 1
- 2
- 3
- 4
Interview Questions:
- Chip Huyen’s ML Interviews Book
Wikipedia: K-Means Clustering
Wikipedia: Voronoi Diagram
StatQuest: K-means clustering
Mathematicalmonk: K-means clustering
Proof of K-Means Converges in Finite Steps

Implementations#

Scikit-learn

Probability & Statistics

K-Means

Contents