, , , ,

Evaluation of Clustering Performance in Scikit-Learn

admin Avatar

Scikit-Learn provides several key functions for evaluating the performance of clustering algorithms effectively and efficiently, as explained below.

1. Adjusted Rand Index

This algorithm measures similarity between two clusters by counting pairs of samples in similar or different clusters. The following example shows how to use this performance evaluation algorithm.

from sklearn.metrics.cluster import adjusted_rand_score
   
labels_defined = [0, 0, 1, 1, 1, 1]
labels_prediction = [0, 0, 2, 2, 3, 3]

print(adjusted_rand_score(labels_defined, labels_prediction))

Output

0.4444444444444444
2. Mutual Information

Mutual Information calculates agreement between two assignments, disregarding permutations, with several available versions, as described below.

2.1. Normalized Mutual Information (NMI)

The example below shows how to use the NMI algorithm for performance evaluation.

from sklearn.metrics.cluster import normalized_mutual_info_score
   
labels_defined = [0, 0, 1, 1, 1, 1]
labels_prediction = [0, 0, 2, 2, 3, 3]

print(normalized_mutual_info_score(labels_defined, labels_prediction))

Output

0.7336804366512113
2.2. Adjusted Mutual Information (AMI)

The following example illustrates how to use the AMI algorithm for performance evaluation.

from sklearn.metrics.cluster import adjusted_mutual_info_score

labels_defined = [0, 0, 1, 1, 1, 1]
labels_prediction = [0, 0, 2, 2, 3, 3]

print(adjusted_mutual_info_score(labels_defined, labels_prediction))

Output

0.6153846153846159
3. Fowlkes-Mallows Score

This algorithm quantifies similarity between two clusters of points as the geometric mean of pairwise precision and recall. The example below represents how to use this algorithm.

from sklearn.metrics.cluster import fowlkes_mallows_score

labels_defined = [0, 0, 1, 1, 1, 1]
labels_prediction = [0, 0, 2, 2, 3, 3]

print(fowlkes_mallows_score(labels_defined, labels_prediction))

Output

0.6546536707079771
4. Silhouette Coefficient

This function in Scikit-Learn calculates the mean Silhouette Coefficient using intra-cluster distance and mean nearest-cluster distance for each sample. The following example shows how to use it based on the iris dataset.

from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

dataset = datasets.load_iris()
data = dataset.data

kmeans = KMeans(n_clusters = 3, random_state = 1).fit(data)
labels = kmeans.labels_
print(silhouette_score(data, labels, metric = 'euclidean'))

Output

0.551191604619592
5. Contingency Matrix

This matrix reports intersection cardinality for trusted true-predicted pairs in a square contingency format. The example below illustrates how to apply it for the performance evaluation purpose.

from sklearn.metrics.cluster import contingency_matrix

data = ['a', 'a', 'a', 'b', 'b', 'b']
target = [0, 0, 2, 1, 1, 0]

print(contingency_matrix(data, target))

Output

[[2 0 1]
 [1 2 0]]
References
  1. Hackeling, G. (2017). Mastering Machine Learning with scikit-learn, 2nd Edition. Packt Publishing Ltd.
  2. Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. O’Reilly Media, Inc.
  3. Tutorials Point. Scikit Learn Tutorial. Retrieved November 20, 2025, from https://www.tutorialspoint.com/.

Tagged in :

admin Avatar

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Love