Exploring Image Compression with K-means Clustering

Rutik Patel
3 min readApr 2, 2024

--

In this blog post, we’ll delve into the fascinating world of image compression using the K-means clustering algorithm. Image compression is a crucial technique used to reduce the storage space required for images while maintaining acceptable image quality. K-means clustering, a popular unsupervised learning algorithm, plays a key role in this process by identifying representative colors in an image.

Table of contents

  1. Understanding K-means Clustering
  2. Image Compression with K-means
  3. Visualizing the Results
  4. Conclusion

Understanding K-means Clustering

Before diving into image compression, let’s briefly review how the K-means clustering algorithm works. K-means clustering is an iterative algorithm that partitions data into K clusters based on their similarity. The algorithm proceeds as follows:

  1. Initialization: Start by randomly initializing K cluster centroids.
  2. Assignment: Assign each data point to the nearest cluster centroid.
  3. Update: Recompute the cluster centroids based on the mean of the data points assigned to each cluster.
  4. Repeat: Iterate steps 2 and 3 until convergence, i.e., until the centroids no longer change significantly.

Image Compression with K-means

Now, let’s apply K-means clustering to compress images. We’ll start with a high-resolution image represented in the RGB (Red, Green, Blue) color space. Each pixel in the image is characterized by its RGB values, which typically range from 0 to 255. Our goal is to reduce the number of unique colors in the image while preserving its visual quality.

Loading and Preprocessing the Image

We begin by loading the original image using the matplotlib library. The image is stored as a three-dimensional matrix, where the first two dimensions represent the pixel coordinates, and the third dimension corresponds to the RGB values. We then reshape the image matrix into a two-dimensional matrix to prepare it for K-means clustering.

import numpy as np
import matplotlib.pyplot as plt
# Load the original image
original_img = plt.imread('image.jpg')
# Reshape the image matrix
X_img = np.reshape(original_img, (original_img.shape[0] * original_img.shape[1], 3))

Applying K-means Clustering

Next, we run the K-means algorithm on the image data to identify the most representative colors. We specify the number of clusters (K) to be used in the compression process. After running K-means, we obtain the centroids representing the selected colors and the indices of the closest centroid for each pixel in the image.

K = 16
max_iters = 10
# Initialize centroids
initial_centroids = kMeans_init_centroids(X_img, K)
# Run K-means algorithm
centroids, idx = run_kMeans(X_img, initial_centroids, max_iters)

Compressing the Image

With the centroids and indices obtained from K-means clustering, we compress the original image by replacing each pixel with the color of its nearest centroid. This results in a compressed image that requires fewer unique colors for representation.

# Find the closest centroid for each pixel
idx = find_closest_centroids(X_img, centroids)
# Replace each pixel with the color of the closest centroid
X_recovered = centroids[idx, :]
# Reshape the compressed image matrix
X_recovered = np.reshape(X_recovered, original_img.shape)

Visualizing the Results

Finally, we visualize the original and compressed images to observe the effects of image compression. Despite reducing the number of unique colors, the compressed image retains most of the visual characteristics of the original, albeit with some compression artifacts.

# Display the original and compressed images
plt.subplot(1, 2, 1)
plt.imshow(original_img)
plt.title('Original Image')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(X_recovered)
plt.title('Compressed Image')
plt.axis('off')
plt.show()

Conclusion

In this blog post, we explored the concept of image compression using the K-means clustering algorithm. By identifying representative colors and replacing pixels with their nearest centroids, we achieved significant reduction in image size while preserving visual quality. Image compression with K-means clustering offers an efficient and effective way to store and transmit images with minimal loss of information.

I hope you found this exploration of image compression with K-means clustering insightful. Feel free to experiment with different images and compression parameters to further explore the capabilities of this technique.

Here is I am providing GitHub link for the reference. Keep Learning!!

https://github.com/rutikkpatel/Practice-Assignments-Machine-Learning-4

--

--