Machine Learning · Unsupervised Learning

k-means clustering

Unsupervised Learning

Introduction

k-means is the simplest and most widely used clustering algorithm in unsupervised learning. It splits a dataset into k disjoint clusters by minimising the within-cluster sum of squared distances to the nearest centroid (WCSS, also called inertia). In this lesson you will see exactly how Lloyd algorithm works, why k-means++ initialisation has been the default since 2007, when elbow and silhouette methods break down, and which assumptions (spherical clusters, comparable variance, feature scaling) you must satisfy for the results to be meaningful.