K-Means#

K-Means is a non-probabilistic clustering algorithm that is used to group data points into a specified number of clusters. However, it can be treated as EM algorithm with a single Gaussian distribution.

Further Readings#

Books and Lectures#

  • Murphy, Kevin P. “Chapter 21.3. K-Means Clustering.” In Probabilistic Machine Learning: An Introduction. MIT Press, 2022.

  • Hal Daumé III. “Chapter 3.4. K-Means Clustering.” In A Course in Machine Learning, January 2017.

  • Hal Daumé III. “Chapter 15.1. K-Means Clustering.” In A Course in Machine Learning, January 2017.

  • Bishop, Christopher M. “Chapter 9.1. K-Means Clustering.” In Pattern Recognition and Machine Learning. New York: Springer-Verlag, 2016.

  • James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. “Chapter 12.4.1. K-Means Clustering.” In An Introduction to Statistical Learning: With Applications in R. Boston: Springer, 2022.

  • Raschka, Sebastian. “Chapter 10.1. Grouping objects by similarity using k-means.” In Machine Learning with PyTorch and Scikit-Learn.

  • Jung, Alexander. “Chapter 8.1. Hard Clustering with K-Means.” In Machine Learning: The Basics. Singapore: Springer Nature Singapore, 2023.

  • Vincent, Tan. “Lecture 17a.” In MA4270 Data Modelling and Computation.

  • CIS 520: Clustering

  • STAT508: Lesson 12. Cluster Analysis

Notebooks#

These two are very good resources for many other topics as well.

Online Resources#

Implementations#

Common FAQ#