Evaluation of Clustering Performance in Scikit-Learn

AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn

admin

February 21, 2026

Scikit-Learn provides several key functions for evaluating the performance of clustering algorithms effectively and efficiently, as explained below.

1. Adjusted Rand Index

This algorithm measures similarity between two clusters by counting pairs of samples in similar or different clusters. The following example shows how to use this performance evaluation algorithm.

from sklearn.metrics.cluster import adjusted_rand_score
   
labels_defined = [0, 0, 1, 1, 1, 1]
labels_prediction = [0, 0, 2, 2, 3, 3]

print(adjusted_rand_score(labels_defined, labels_prediction))

Output

0.4444444444444444

2. Mutual Information

Mutual Information calculates agreement between two assignments, disregarding permutations, with several available versions, as described below.

2.1. Normalized Mutual Information (NMI)

The example below shows how to use the NMI algorithm for performance evaluation.

from sklearn.metrics.cluster import normalized_mutual_info_score
   
labels_defined = [0, 0, 1, 1, 1, 1]
labels_prediction = [0, 0, 2, 2, 3, 3]

print(normalized_mutual_info_score(labels_defined, labels_prediction))

Output

0.7336804366512113

2.2. Adjusted Mutual Information (AMI)

The following example illustrates how to use the AMI algorithm for performance evaluation.

from sklearn.metrics.cluster import adjusted_mutual_info_score

labels_defined = [0, 0, 1, 1, 1, 1]
labels_prediction = [0, 0, 2, 2, 3, 3]

print(adjusted_mutual_info_score(labels_defined, labels_prediction))

Output

0.6153846153846159

3. Fowlkes-Mallows Score

This algorithm quantifies similarity between two clusters of points as the geometric mean of pairwise precision and recall. The example below represents how to use this algorithm.

from sklearn.metrics.cluster import fowlkes_mallows_score

labels_defined = [0, 0, 1, 1, 1, 1]
labels_prediction = [0, 0, 2, 2, 3, 3]

print(fowlkes_mallows_score(labels_defined, labels_prediction))

Output

0.6546536707079771

4. Silhouette Coefficient

This function in Scikit-Learn calculates the mean Silhouette Coefficient using intra-cluster distance and mean nearest-cluster distance for each sample. The following example shows how to use it based on the iris dataset.

from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

dataset = datasets.load_iris()
data = dataset.data

kmeans = KMeans(n_clusters = 3, random_state = 1).fit(data)
labels = kmeans.labels_
print(silhouette_score(data, labels, metric = 'euclidean'))

Output

0.551191604619592

5. Contingency Matrix

This matrix reports intersection cardinality for trusted true-predicted pairs in a square contingency format. The example below illustrates how to apply it for the performance evaluation purpose.

from sklearn.metrics.cluster import contingency_matrix

data = ['a', 'a', 'a', 'b', 'b', 'b']
target = [0, 0, 2, 1, 1, 0]

print(contingency_matrix(data, target))

Output

[[2 0 1]
 [1 2 0]]

References

Hackeling, G. (2017). Mastering Machine Learning with scikit-learn, 2nd Edition. Packt Publishing Ltd.
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. O’Reilly Media, Inc.
Tutorials Point. Scikit Learn Tutorial. Retrieved November 20, 2025, from https://www.tutorialspoint.com/.

Tagged in :

admin

You May Love

AI, Artificial Intelligence, Digital Twin, Internet of Things, IoT, Machine Learning, ML

What is a Digital Twin?

April 4, 2026

.

admin

Introduction Internet of Things (IoT) development has led to technologies like Digital Twin, utilized across logistics, healthcare, automation, manufacturing, and asset…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Dimensionality Reduction Using PCA in Scikit-Learn

February 21, 2026

.

admin

Dimensionality reduction optimizes data samples by selecting principal features, with Principal Component Analysis (PCA) being a widely used algorithm for this…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn

February 21, 2026

.

admin

Scikit-Learn provides several key functions for evaluating the performance of clustering algorithms effectively and efficiently, as explained below. 1. Adjusted Rand…

AysamByte

Evaluation of Clustering Performance in Scikit-Learn

1. Adjusted Rand Index

2. Mutual Information

2.1. Normalized Mutual Information (NMI)

2.2. Adjusted Mutual Information (AMI)

3. Fowlkes-Mallows Score

4. Silhouette Coefficient

5. Contingency Matrix

References

Leave a Reply Cancel reply

You May Love

What is a Digital Twin?

Dimensionality Reduction Using PCA in Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn