Application: Image Compression and Segmentation#

On the high level, K-Means is an unsupervised machine-learning algorithm that is used for clustering. Given a set of data points, the algorithm groups the points into \(K\) clusters, where \(K\), a priori, is a user-defined number of clusters. The algorithm calculates the mean of each cluster, known as the cluster centroid, and assigns each data point to the closest cluster. This process is repeated until the cluster centroids converge to stable values.

import sys
from pathlib import Path

parent_dir = str(Path().resolve().parents[3])
print(parent_dir)
sys.path.append(parent_dir)

from typing import Any, Dict, Tuple
from urllib.request import urlopen

import cv2
import math
import numpy as np
import matplotlib.pyplot as plt
import PIL
from rich.pretty import pprint
from skimage import color
from sklearn.cluster import KMeans
from tabulate import tabulate

from src.clustering.kmeans.kmeans import KMeansLloyd
from src.utils.plot import use_svg_display

use_svg_display()
/home/runner/work/gaohn-probability-stats/gaohn-probability-stats
root_dir = Path().resolve().parents[0]
assets_dir = root_dir / "assets"

Introduction#

Image compression is the process of reducing the size of an image file while maintaining its quality.

This is important because images take up a lot of memory and bandwidth, and therefore, compressing them can save time, space, and money. A frequently used technique for image compression is vector quantization, which maps each pixel in an image to a more limited set of representative colors. One method to accomplish this is through K-Means clustering, where the image is divided into \(K\) clusters and each pixel’s color is replaced by the average color of its cluster. This results in the image being represented by \(K\) distinct colors, which significantly reduces its size.

K-Means Clustering#

In this section, we briefly give readers a short primer on the definition and the algorithm that governs K-Means.

On the high level, K-Means is an unsupervised machine-learning algorithm that is used for clustering. Given a set of data points, the algorithm groups the points into \(K\) clusters, where \(K\), a priori, is a user-defined number of clusters. The algorithm calculates the mean of each cluster, known as the cluster centroid, and assigns each data point to the closest cluster. This process is repeated until the cluster centroids converge to stable values.

Let’s make this idea more concrete by introducing the concept with proper notations.

See K-Means concept for a detailed explanation of the algorithm.

Problem Statement#

Given a set of \(N\) data points

\[ \mathcal{S}=\left\{\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(N)}\right\} \subseteq \mathbb{R}^D \]

where the \(n\)-th sample is vector with \(D\) number of features,

\[ \mathbf{x}^{(n)} \in \mathbb{R}^D=\left[\begin{array}{llll} x_1^{(n)} & x_2^{(n)} & \cdots & x_D^{(n)} \end{array}\right]^{\mathrm{T}} \]

The K-Means algorithm aims to group the data points into \(K\) clusters

\[ \hat{C}=\left\{\hat{C}_1^{(t)}, \hat{C}_2^{(t)}, \ldots, \hat{C}_K^{(t)}\right\} \]

where

\[ \hat{C}_k=\left\{\mathbf{x}^{(n)} \in \mathbb{R}^D \mid \mathcal{A}(n)=k\right\} \]

Note that \(\mathcal{A}(\cdot)\) is the assignment map that “predicts” and “classify” each data point into their respective clusters. What follows is the definition of centroids (cluster centers),

\[ \boldsymbol{v}_1, \boldsymbol{v}_2, \ldots, \boldsymbol{v}_K \in \mathbb{R}^D \]

where \(\boldsymbol{v}_k\) is the centroid of cluster \(C_k\). Intuitively, these vectors should “represent” their respective cluster well. It can be proven that these centroids are nothing but the mean vector \(\boldsymbol{\mu}_k\) of cluster \(C_k\).

To this end, we want to find the assignment \(\mathcal{A}(\cdot)\) and the cluster center \(\boldsymbol{v}_k\) such that the sum of squared distances between each data point and its cluster center is minimized. Mathematically, this means partitioning the data points according to the Voronoi Diagram.

More formally, we minimize the following objective function:

\[\begin{split} \begin{aligned} & \underset{\mathcal{A}, \boldsymbol{v}_k}{\operatorname{argmin}} \hat{\mathcal{J}}_{\mathcal{S}}\left(\mathcal{A}, \boldsymbol{v}_1, \boldsymbol{v}_2, \ldots, \boldsymbol{v}_K\right)=\sum_{n=1}^N \sum_{\mathcal{A}(n)=k}\left\|\mathbf{x}^{(n)}-\boldsymbol{v}_k\right\|^2 \\ & \text { subject to } \quad \hat{C}_1 \sqcup \hat{C}_2 \sqcup \cdots \sqcup \hat{C}_K=\mathcal{S} \\ & \boldsymbol{v}_1, \boldsymbol{v}_2, \ldots, \boldsymbol{v}_K \in \mathbb{R}^D \\ & \end{aligned} \end{split}\]

The problem in itself seems manageable since we are trying to partition the data points into \(K\) clusters and minimize the intra-cluster distance (variances). However, jointly optimizing both the assignment \(\operatorname{map} \mathcal{A}(\cdot)\) and the centroids \(\boldsymbol{v}_k\) is a NP-hard problem and is computationally challenging.

Consequently, there are many heuristics that are used to solve the problem. We will talk about one of the most popular heuristics, the Lloyd’s algorithm in the next section.

Lloyd’s Algorithm#

Lloyd’s algorithm is the most well known algorithm that greedily optimizes and solve the K-Means clustering problem. It is greedy because it optimizes and finds the best step at each iteration. Consequently, it guarantees the objective function converges to a local minimum.

What follows is a description of the Lloyd’s algorithm.

Step 1. Initialization Step.#

Initialize \(K\) cluster centers \(\boldsymbol{\mu}_1^{(0)}, \boldsymbol{\mu}_2^{(0)}, \ldots, \boldsymbol{\mu}_K^{(0)}\) randomly (best to be far apart) where the superscript \([(0)]\) denotes the iteration number \(t=0\).

Some things to note:

  • In the very first iteration, there are no data points in any cluster \(\hat{C}_k^{(0)}=\emptyset\). Therefore, the cluster centers are just randomly chosen for simplicity.

  • By random, we mean that the cluster centers are randomly chosen from the data points \(\mathcal{S}=\left\{\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(N)}\right\}\) and not randomly chosen from the feature space \(\mathbb{R}^D\).

  • Subsequent iterations will have data points in the clusters \(\hat{C}_k^{(t)} \neq \emptyset\) and thus \(\boldsymbol{\mu}_k^{(t)}\) will be the mean of the data points in cluster \(k\).

  • Each \(\boldsymbol{\mu}_k^{(0)}=\left[\begin{array}{llll}\mu_{1 k}^{(0)} & \mu_{2 k}^{(0)} & \cdots & \mu_{D k}^{(0)}\end{array}\right]^{\mathrm{T}}\) is a \(D\)-dimensional vector, where \(D\) is the number of features, and represents the mean vector of all the data points in cluster \(k\). Note that \(\mu_{d k}^{(0)}\) is the mean value of the \(d\)-th feature in cluster \(k\).

  • We denote \(\boldsymbol{\mu}=\left[\begin{array}{llll}\boldsymbol{\mu}_1 & \boldsymbol{\mu}_2 & \cdots & \boldsymbol{\mu}_K\end{array}\right]_{K \times D}^T\) to be the collection of all \(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2, \ldots, \boldsymbol{\mu}_K\)

Step 2. Assignment Step.#

Then for \(t=0,1,2, \ldots\) :

Assignment Step (E): Assign each data point \(\mathbf{x}^{(n)}\) to the closest cluster center \(\boldsymbol{\mu}_k^{(t)}\),

\[ \hat{y}^{(n) t}:=\mathcal{A}^{*(t)}(n)=\underset{k \in 1,2, \ldots, K}{\operatorname{argmin}}\left|\mathbf{x}^{(n)}-\boldsymbol{\mu}_k^{(t)}\right|^2 \]

In other words, \(\hat{y}^{(n) t}\) is the output of the optimal assignment rule at the \(t\)-th iteration and is the index of the cluster center \(\boldsymbol{\mu}_k^{(t)}\) that is closest to \(\mathbf{x}^{(n)}\).

Step 3. Update Step.#

Update Step (M): Update the cluster centers for the next iteration.

\[ \boldsymbol{\mu}_k^{(t+1)}=\frac{1}{\left|\hat{C}_k^{(t)}\right|} \sum_{\mathbf{x}^{(n)} \in \hat{C}_k^{(t)}} \mathbf{x}^{(n)} \]

Notice that the cluster center \(\boldsymbol{\mu}_k^{(t+1)}\) is the mean of all data points that are assigned to cluster \(k\).

Step 4. Repeat Till Convergence#

Repeat steps 2 and 3 until the centroids stop changing.

\[ \boldsymbol{\mu}_k^{(t+1)}=\boldsymbol{\mu}_k^{(t)} \]

This is the convergence condition.

In the above algorithm, we have assumed without proof that the mentioned assignment map \(\mathcal{A}(\cdot)\) and the centroids \(\boldsymbol{\mu}_k\) does indeed minimize the objective function \(\mathcal{J}\) at each step. For a detailed proof, see concept section.

To this end, we have defined what K-Means is.

Vector Quantization#

Recall that we mentioned that image compression vector quantization is a technique used in image compression to reduce the amount of data needed to represent an image, and that K-Means is a method of vector quantization. We shall see below that they are of equivalent form.

Given a sequence of real-valued vector \(\left\{\mathbf{x}^{(n)}\right\}_{n=1}^N\) where each \(\mathbf{x}^{(n)} \in \mathbb{R}^{D}\), we can use vector quantization to perform a lossy compression as follows.

First, we replace each real-valued vector \(\mathbf{x}^{(n)} \in \mathbb{R}^D\) with a discrete symbol \(\mathbf{z}^{(n)} \in {1, \ldots , K}\). How do we find such \(\mathbf{z}^{(n)}\)?

We need to define a codebook 1 of \(K\) prototypes \(\left\{\boldsymbol{\mu}_k\right\}_{k=1}^K\ \in \mathbb{R}^D\). Since each real-valued vector \(\mathbf{x}^{(n)} \in \mathbb{R}^D\) is replaced with a discrete symbol \(\mathbf{z}^{(n)} \in {1, \ldots , K}\), we can think of the codebook as a lookup table that maps each symbol to a prototype.

Finally, we encode each real-valued vector \(\mathbf{x}^{(n)} \in \mathbb{R}^D\) by identifying the prototype that is most similar to it, using Euclidean distance as a measure of similarity:

\[ h\left(\mathbf{x}^{(n)}\right) = \text{encode}\left(\mathbf{x}^{(n)}\right) = \arg \min_{k} \left|\mathbf{x}^{(n)} - \boldsymbol{\mu}_k\right|^2 \]

In other words, \(h(\mathbf{x}^{(n)}) = \mathbf{z}^{(n)} = k\) if \(\boldsymbol{\mu}_k\) is the closest prototype to \(\mathbf{x}^{(n)}\).

The quality of a codebook can be assessed by determining the reconstruction error or the amount of distortion it causes, as expressed by the following equation:

\[ \mathcal{J} = \frac{1}{N} \sum_{n=1}^N |\mathbf{x}^{(n)} - \text{decode}(\text{encode}(\mathbf{x}^{(n)}))|^2 = \frac{1}{N} \sum_{n=1}^N |\mathbf{x}^{(n)} - \boldsymbol{\mu}_{\mathbf{z}^{(n)}}|^2 \]

where the decode map is

\[ \boldsymbol{\mu}_{\mathbf{z}^{(n)}} = \hat{\mathbf{x}}^{(n)} = r(\mathbf{z}^{(n)}) = r(h(\mathbf{x}^{(n)})) \]

This cost function has exactly the same form as the K-means algorithm. In theory, we can achieve zero distortion if we assign one prototype to every data vector by using \(K = N\) and setting \(\boldsymbol{\mu}_n = \mathbf{x}^{(n)}\). However, this does not compress the data and requires \(O(NDB)\) bits to represent, where \(N\) is the number of real-valued data vectors, each with length \(D\), and \(B\) is the number of bits needed to represent a real-valued scalar (i.e. the quantization accuracy).

A Primer on Binary Digits (Bits) and 8-bit Unsigned Integers#

To better appreciate how image compression work, it is important to understand how images (and any information) are stored in computers.

Information is stored in computers as a series of bits, which are the smallest units of information used in a computer. They have binary values, meaning they can either be 0 or 1. A series of bits is used to represent data and constitutes the backbone of all digital information.

Images are often stored in 8-bit precision, which means they are represented as 8-bit unsigned integers. An 8-bit unsigned integer is a type of integer data type that can store a positive value between 0 and 255, inclusive, without a sign bit. Each bit in an 8-bit unsigned integer requires one memory space and can only have a value of 0 or 1.

For example, let’s say a pixel value is 100. It can be represented in 8-bit binary form as 01100100. In an 8-bit unsigned integer representation, the most significant bit (leftmost bit) represents 128 (\(2^{7}\)), the second most significant bit represents 64 (\(2^{6}\)), the third most significant bit represents 32 (\(2^{5}\)), and so on, until the least significant bit (rightmost bit) represents 1 (\(2^{0}\)). Therefore, the value of 100 in 8-bit unsigned integer can be decomposed as follows:

\[\begin{split} \begin{align} 100 &= 01100100 \\ &= 0 \times 2^7 + 1 \times 2^6 + 1 \times 2^5 + 0 \times 2^4 + 0 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 0 \times 2^0 \\ &= 0 + 64 + 32 + 0 + 0 + 4 + 0 + 0 \\ &= 100 \end{align} \end{split}\]

This is why most images range from 0 to 255, as they are stored as 8-bit unsigned integers.

The number of bits needed to store an image depends on the size of the image, the number of color channels, and the number of bits used to represent each color channel.

Number of Bits needed for a Positive Integer#

Given a positive integer \(K\), the number of bits needed to represent it is:

\[ n_{bits} = \lceil \log_2 K \rceil \]

where:

  • \(n_{bits}\) is the number of bits required;

  • \(\lceil \cdot \rceil\) is the ceiling function, which rounds up to the nearest integer.

For example, if \(K = 16\), then \(n_{bits} = \lceil \log_2 16 \rceil = \lceil 4 \rceil = 4\).

More concretely, there are a total of \(16\) numbers from \(0\) to \(15\), so the least number of bits required to represent these numbers is \(4\).

  • \(0 \to 0000\)

  • \(1 \to 0001\)

  • \(10 \to 1010 = 1 \times 2^3 + 0 \times 2^2 + 1 \times 2^1 + 0 \times 2^0 = 8 + 2 = 10\)

  • \(15 \to 1111\)

Thus, with \(4\) bits, you can represent \(16\) different values (\(0000, 0001, 0010, ..., 1111\)), which is enough to represent the \(16\) different numbers from \(0\) to \(15\).

def calculate_num_bits(arr: np.ndarray, max_int: int) -> Tuple[int, int]:
    """
    Calculate the number of bits required to store a NumPy array.
    
    Parameters:
        arr (np.ndarray): A NumPy array.
        max_int (int): The maximum integer value allowed in this number system.
    
    Returns:
        num_bits (int): The number of bits required to store each value.
        total_bits (int): The number of bits required to store the array.
    """
    # number of bits required to store each value
    num_bits = int(math.ceil(math.log2(max_int))) 
    total_bits = num_bits * arr.size
    return num_bits, total_bits
calculate_num_bits(np.array([1, 2, 13, 16, 10]), 16)
(4, 20)

Image Compression with K-Means#

In image compression, we use K-Means to group similar pixels into \(K\) clusters. Each cluster centroid represents a representative color for the pixels in the cluster, and we can map each pixel to the closest centroid. This reduces the number of colors required to represent the image, and thus the size of the image data. The mapping of each pixel to a cluster centroid can be stored using a smaller number of bits compared to the original 24-bit RGB values.

Where did this 24-bit come from? Let’s define the problem statement and find out!

Problem Statement#

Now, let’s consider the problem of image compression.

Let an image be denoted by \(\mathbf{I} \in \mathbb{R}^{m \times n \times 3}\), where \(m\) is the width of the image, \(n\) is the height of the image, and 3 is the number of color channels representing the red, green, and blue colors. We have the following properties:

  • There are a total of \(N = m \times n\) pixels in the image \(\mathbf{I}\).

  • Each pixel \(\mathbf{x} = \begin{bmatrix} r & g & b \end{bmatrix}^T \in \mathbb{Z}_{rgb}^3\) on the image \(\mathbf{I}\) is a 3-d tuple residing in the 3-dimensional integer space comprising the intensities of the red, green and blue channels.

  • \(\mathbb{Z}_{rgb}\) represents all integers from \(0\) to \(255\). Therefore, each color channel in the pixel is of the range 0 to 255, and is assumed to be an 8-bit unsigned integer.

Based on our earlier discussion of bits, we require 8-bits to represent numbers from \(0\) to \(255\). It follows that each pixel contains \(3 \times 8 = 24\) bits of information. Consequently, the total number of bits required to represent the image is \(24 \times N = 24N\) bits.

Our aim is to find a compression map \(h: \mathbb{R}^{N} \rightarrow \mathbb{R}^{M}\), where \(M < N\), such that the number of bits required to represent the compressed image is less than the number of bits required to represent the original image.

The compressed output \(\mathbf{z} = h(\mathbf{x})\) is therefore called a code (compressed representation) of the input \(\mathbf{x}\) (original representation).

We can then reconstruct the original image \(\mathbf{I}\) from the compressed image \(\mathbf{z}\) by a reconstruction map \(r: \mathbb{R}^{M} \rightarrow \mathbb{R}^{N}\), such that \(\hat{\mathbf{x}} = r(\mathbf{z}) = r(h(\mathbf{x}))\). Note that \(\hat{\mathbf{x}}\) is the reconstructed representation of \(\mathbf{x}\) and may or may not be the same as \(\mathbf{x}\). If they are not the same, then the compression is lossy.

Steps to Compress an Image#

Here is a step-by-step guide of using K-Means for image compression, we will later solidify this with code implementation.

  • Read the image: Load the image \(\mathbf{I}\) into memory. Typically \(\mathbf{I}\) will be converted to a numpy array of shape \(m \times n \times 3\).

  • Reshape the image data: Reshape the image matrix \(\mathbf{I}\) into a 2D array of size \(N \times 3\), where \(N = m \times n\) is the total number of pixels in the image. Each row of the array represents a pixel in the image. This step is necessary because K-Means expects a 2D array as input. One can now think of \(N \times 3\) as a dataset \(\mathcal{S}\) with \(N\) data points, with \(D=3\) features.

  • Apply K-Means clustering: Apply K-Means clustering to the reshaped image data, using a user-defined number of clusters, \(K\). This will group similar pixels into \(K\) clusters and calculate the cluster centroids \(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2, \ldots, \boldsymbol{\mu}_K\). Each centroid is a 3-dimensional vector representing the RGB values of the representative color for the cluster.

  • Compress the image: For each pixel in the image, find the closest cluster centroid \(\boldsymbol{\mu}_k\) and replace the pixel with the index \(\hat{y}^{(n)}\) of the centroid.

  • Store the compressed image: Store the compressed image, which includes the cluster centroids and the mapping of each pixel to a centroid, to disk.

Example#

To put things into perspective, let’s consider a simple example.

Let us consider an image of size \(100 \times 100 \times 3\) pixels.

Then there are \(N = 100 \times 100 = 10,000\) pixels in the image, and each pixel is represented by a 3-dimensional vector.

Then assuming that each value in the RGB tuple is stored with 8 bits of precision, then the total number of bits required to represent the image is

\[ 24N = 24 \times 100 \times 100 = 240,000 \text{ bits} \]

Then suppose we run K-Means with \(K=16\) clusters. We will get \(16\) cluster centroids, each of which is a \(3\)-dimensional vector. This requires us to store \(16 \times 3 = 48\) numbers, each of which is represented with 8 bits of precision. This leads to a total of \(16 \times 3 \times 8 = 384\) bits to store the cluster centroids.

Next we also need to store the mapping of each pixel in the image to a cluster centroid. This is an \(N \times 1\) vector, and for each pixel, we are only storing a single number ranging from 0 to 15. This means that we will need \(\lceil \log_2 16 \rceil = 4\) bits to represent the index of the cluster centroid to which a pixel is assigned. This amounts to a total of \(10,000 \times 4 = 40,000\) bits to store the mapping of each pixel to a cluster centroid.

Thus, the total number of bits required to store the compressed image is

\[ 40,000 + 384 = 40,384 \text{ bits} \]

compared to the original image size of \(240,000\) bits. This results in a compression ratio of

\[ 40,384 / 240,000 = 16.8\% \]

Step by step code implementation#

In this section, we will go through the process of applying K-Means for compressing images. Specifically, we will use a hestain image, which is a type of tissue image that has been stained with hematoxylin and eosin (H&E). This staining technique is utilized by pathologists to differentiate between tissue types that are colored blue-purple and pink 2.

Load and Read the Image#

image_path = "https://storage.googleapis.com/reighns/images/hestain.png"
image = np.array(PIL.Image.open(urlopen(image_path)).convert("RGB"))
image_shape = image.shape
print(f"Image shape: {image_shape}")
print(f"Image dtype: {image.dtype}") # to make sure it's uint8
Image shape: (227, 303, 3)
Image dtype: uint8

Here are the steps for the code:

  • Define the image path as a URL, "https://storage.googleapis.com/reighns/images/hestain.png"

  • Convert the image into a numpy array using the PIL library, by opening the image using PIL.Image.open method and converting it into an RGB format using the convert method.

  • Store the shape of the image as a tuple in the variable image_shape.

Size of the Image#

_, image_size_in_bits = calculate_num_bits(image, 255)
print(f"Image size in bits: {image_size_in_bits}")
Image size in bits: 1650744
plt.axis("off")
plt.imshow(image);
plt.savefig(f"{assets_dir}/hestain.png", dpi=300)
../../../_images/b656c2e8fb63ad296d50cb1f84637888de245416496d2113012316c2ed40c6f8.svg

Applying Compression via K-Means#

In what follows, we will write a function compress_image that takes in an image represented as a numpy array image and other optional parameters as a dictionary kmeans_kwargs, which are passed to the K-Means algorithm. The function performs the following steps:

  • Reshapes the image from (height, width, depth) to (height * width, depth) to convert it into a 2D array, where each row is a pixel represented by its RGB values.

  • Instantiates the K-Means algorithm with the given kmeans_kwargs and fits the K-Means model on the pixel data.

  • Extracts the predicted cluster labels for each pixel and name it compressed_image.

  • Extracts the predicted centroids (mean values) of each cluster and name it centroids.

  • Returns compressed_image and centroids. The former has a shape of \((N, 1)\), where \(N\) is the number of pixels in the image, and the latter has a shape of \((K, 3)\), where \(K\) is the number of clusters. Note that compressed_image is nothing but a 1D array of cluster labels, and centroids is a 2D array of RGB values of the cluster centroids.

def compress_image(
    image: np.ndarray, **kmeans_kwargs: Dict[str, Any]
) -> Tuple[np.ndarray, np.ndarray]:
    pixels = image.reshape((image.shape[0] * image.shape[1], 3))

    kmeans = KMeans(**kmeans_kwargs)
    kmeans.fit(pixels)

    compressed_image = kmeans.labels_.astype(np.uint8) # y_preds and convert to uint8
    centroids = kmeans.cluster_centers_ # total bits = 3 * 8 * K
    
    return compressed_image, centroids

Let’s run the code with K=2.

K = 2
compressed_image, centroids = compress_image(
    image, n_clusters=K, n_init="auto", max_iter=300, random_state=42
)

print(centroids)
[[135.29518148  83.27374723 191.3692385 ]
 [205.86058271 141.38666637 238.61435827]]

The function get_image_compression_ratio_for_kmeanstakes in four arguments:

  • image: a NumPy array representing the original image

  • compressed_image: a NumPy array representing the compressed imag obtained using the K-Means algorithm

  • centroids: a NumPy array representing the centroids (mean values) o each cluster obtained using the K-Means algorithm

  • \(k\) `: an integer representing the number of clusters used for the K-Mean algorithm

  • display: a boolean indicating whether to display the compression ratic or not. This parameter is optional and defaults to True.

The function then calculates the original size, compressed image size, and centroids size in bits by calling the calculate_num_bits function with the appropriate arguments. The calculate_num_bits function calculates the number of bits required to represent the input data based on the maximum value that can be stored for that data type. For example, for an image with pixel values ranging from 0 to 255 , each pixel value can be represented using 8 bits.

Next, the function calculates the total compressed size by adding the compressed image size and centroids size. The compression ratio is then calculated by dividing the compressed size by the original size.

If the display parameter is True, the function prints the table containing the original size, compressed size, and compression ratio.

# def get_image_compression_ratio(
#     image: np.ndarray, compressed_image: np.ndarray, centroids: np.ndarray
# ) -> float:
#     original_size = sys.getsizeof(image.tobytes()) * 8 # in bits
#     compressed_image_size = sys.getsizeof(compressed_image.tobytes()) * 8 # in bits
#     centroids_size = sys.getsizeof(centroids.tobytes()) * 8 # in bits
#     compressed_size = compressed_image_size + centroids_size
#     compression_ratio = original_size / compressed_size
#     print(f"Original size: {original_size}\nCompressed Image size: {compressed_image_size}\nCentroids size: {centroids_size}\nCompressed size: {compressed_size}\nCompression ratio: {compression_ratio}")
#     return compression_ratio

def get_image_compression_ratio_for_kmeans(
    image: np.ndarray, compressed_image: np.ndarray, centroids: np.ndarray, k: int, display:bool=True
) -> float:
    _, original_size = calculate_num_bits(image, 255)  # in bits
    _, compressed_image_size = calculate_num_bits(compressed_image, k)  # in bits
    _, centroids_size = calculate_num_bits(centroids, 255)  # in bits
    compressed_size = compressed_image_size + centroids_size
    compression_ratio = compressed_size / original_size
    
    if display:
        data = [
            ["Original size", original_size],
            ["Compressed Image size", compressed_image_size],
            ["Centroids size", centroids_size],
            ["Compressed size", compressed_size],
            ["Compression ratio", f"{compression_ratio:.2f}"]
        ]

        table = tabulate(data, headers=["Metric", "Value"], tablefmt="pretty")
        print(table)
    return compression_ratio
compressed_ratio = get_image_compression_ratio_for_kmeans(
    image, compressed_image, centroids, K
)
+-----------------------+---------+
|        Metric         |  Value  |
+-----------------------+---------+
|     Original size     | 1650744 |
| Compressed Image size |  68781  |
|    Centroids size     |   48    |
|    Compressed size    |  68829  |
|   Compression ratio   |  0.04   |
+-----------------------+---------+

Reconstruction of the Image#

This code defines a function called reconstruct_image, which takes three arguments: -compressed_image: A NumPy array of shape (N, 1) representing the compressed image, where N is the number of pixels in the original image.

  • centroids: A NumPy array of shape (K, 3) representing the cluster centroids obtained after performing K-Means on the original image.

  • original_shape: A tuple representing the shape of the original image, in the format (height, width, depth).

The function performs the following steps to reconstruct the original image from the compressed form:

  1. It uses the compressed_image array to index into the centroids array, effectively replacing each pixel value in compressed_image with its corresponding cluster centroid.

  2. It then reshapes the resulting array to match the original shape of the image.

  3. Finally, it returns the reconstructed image as a NumPy array of unsigned 8-bit integers.

In summary, the reconstruct_image function takes a compressed image represented as a 1D array of cluster labels and the RGB values of the cluster centroids, and returns the original image in its 3D NumPy array representation.

def reconstruct_image(
    compressed_image: np.ndarray, centroids: np.ndarray, original_shape: Tuple[int, int, int]
) -> np.ndarray:
    reconstructed_image = centroids[compressed_image]
    reconstructed_image = reconstructed_image.reshape(original_shape).astype(np.uint8)
    return reconstructed_image
reconstructed_image = reconstruct_image(compressed_image, centroids, image.shape)

# fig = plt.figure(figsize=(8, 4))

plt.axis("off")
plt.imshow(reconstructed_image);
plt.savefig(f"{assets_dir}/reconstructed_image.png", dpi=300)
../../../_images/59619c5c3bc4d4a2267fa7a8a2f801826440c7a4c0e15bed81689390916545e4.svg

To make the colors more distinct, we can use skimage.color.

colored_image = color.label2rgb(compressed_image.reshape(image.shape[:2]), image)
plt.imshow(colored_image);
../../../_images/e330e536c95abf48c89c1085e3906ddbc2199fa1363ce7f2c61cf7bdfcace641.svg
fig, axes = plt.subplots(nrows=1, ncols=5, figsize=(20, 5), sharex=True, sharey=True)

axes[0].imshow(image)
axes[0].set_title("Original image")
axes[0].axis("off")

# so centroids row 1 is the centroid for cluster 1
# etc
# and so for the original image, mask each pixel
# with the centroid's cluster, in other words
# each pixel becomes the centroid
for index, k in enumerate([2, 3, 6, 16], start=1):
    compressed_image, centroids = compress_image(image, n_clusters=k, n_init="auto", max_iter=300, random_state=42)
    compression_ratio = get_image_compression_ratio_for_kmeans(image, compressed_image, centroids, k)
    reconstructed_image = reconstruct_image(compressed_image, centroids, image.shape)
    axes[index].set_title(f"K = {k}, compression ratio = {compression_ratio:.2f}")
    axes[index].imshow(reconstructed_image)
    axes[index].axis("off")

fig.savefig(f"{assets_dir}/kmeans_compression.png", dpi=300)
+-----------------------+---------+
|        Metric         |  Value  |
+-----------------------+---------+
|     Original size     | 1650744 |
| Compressed Image size |  68781  |
|    Centroids size     |   48    |
|    Compressed size    |  68829  |
|   Compression ratio   |  0.04   |
+-----------------------+---------+
+-----------------------+---------+
|        Metric         |  Value  |
+-----------------------+---------+
|     Original size     | 1650744 |
| Compressed Image size | 137562  |
|    Centroids size     |   72    |
|    Compressed size    | 137634  |
|   Compression ratio   |  0.08   |
+-----------------------+---------+
+-----------------------+---------+
|        Metric         |  Value  |
+-----------------------+---------+
|     Original size     | 1650744 |
| Compressed Image size | 206343  |
|    Centroids size     |   144   |
|    Compressed size    | 206487  |
|   Compression ratio   |  0.13   |
+-----------------------+---------+
+-----------------------+---------+
|        Metric         |  Value  |
+-----------------------+---------+
|     Original size     | 1650744 |
| Compressed Image size | 275124  |
|    Centroids size     |   384   |
|    Compressed size    | 275508  |
|   Compression ratio   |  0.17   |
+-----------------------+---------+
../../../_images/e82766f56d36bdf5401b59d8c3be7b29542e271446dc251972fe13524826c913.svg

We can also use our own implementation of K-Means to perform image compression, we see that the results are the same. However, ours is very slow since not every operation is vectorized.

pixels = image.reshape((image.shape[0] * image.shape[1], 3))
kmeans = KMeansLloyd(num_clusters=2, init="random", max_iter=30, random_state=2023)

kmeans.fit(pixels)
y_preds = kmeans.labels
centroids = kmeans.centroids
# print(centroids)
segmented_image = centroids[y_preds]

segmented_image = segmented_image.reshape(image.shape) # reshape back

segmented_image = segmented_image.astype(np.uint8) # make int

plt.imshow(segmented_image)
Using Seed Number 2023
Converged at iteration 23
<matplotlib.image.AxesImage at 0x7f9764d5b8b0>
../../../_images/0dce73602279a7e99318576dc3c58d206febe004a22e25ac46363bd6d81053df.svg
image_path = "../assets/football.bmp"
image = cv2.imread(image_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(20, 10), sharex=True, sharey=True)

axes[0].imshow(image)
axes[0].set_title("Original image")
axes[0].axis("off")

for index, k in enumerate([2, 3, 6], start=1):
    compressed_image, centroids = compress_image(image, n_clusters=k, n_init="auto", max_iter=300)
    compression_ratio = get_image_compression_ratio_for_kmeans(image, compressed_image, centroids, k)
    reconstructed_image = reconstruct_image(compressed_image, centroids, image.shape)
    axes[index].set_title(f"K = {k}, compression ratio = {compression_ratio:.2f}")
    axes[index].imshow(reconstructed_image)
    axes[index].axis("off")
+-----------------------+---------+
|        Metric         |  Value  |
+-----------------------+---------+
|     Original size     | 6130560 |
| Compressed Image size | 255440  |
|    Centroids size     |   48    |
|    Compressed size    | 255488  |
|   Compression ratio   |  0.04   |
+-----------------------+---------+
+-----------------------+---------+
|        Metric         |  Value  |
+-----------------------+---------+
|     Original size     | 6130560 |
| Compressed Image size | 510880  |
|    Centroids size     |   72    |
|    Compressed size    | 510952  |
|   Compression ratio   |  0.08   |
+-----------------------+---------+
+-----------------------+---------+
|        Metric         |  Value  |
+-----------------------+---------+
|     Original size     | 6130560 |
| Compressed Image size | 766320  |
|    Centroids size     |   144   |
|    Compressed size    | 766464  |
|   Compression ratio   |  0.13   |
+-----------------------+---------+
../../../_images/ba1a4d3999db1b7c6f4335b548bcc3fb79eb77e4158d0dc411b718070ae58d51.svg

It is important to note that while K-Means can provide significant size reductions for some images, it may not always be the best method for image compression. K-Means is a lossy compression method, meaning that the quality of the compressed image will be lower than the original image. This means that while it can provide significant size reductions, it may not always be the best method for image compression (i.e. if you are seeking for lossless compression).

The quality of the compressed image will depend on the number of clusters used and the similarity of the pixels in the image. In general, a larger number of clusters will result in higher image quality, but also a larger file size. Experimentation is often necessary to find the optimal number of clusters for a given image.

References#

  • Bishop, Christopher M. “Chapter 9.1.1. Image Segmentation and Compression.” In Pattern Recognition and Machine Learning. New York: Springer-Verlag, 2016.


1

In the context of lossy compression, a codebook is a collection of prototypes, or a set of reference vectors, that are used to represent a set of real-valued vectors in a compressed form.

2

https://www.mathworks.com/help/images/color-based-segmentation-using-k-means-clustering.html