Techniques for Anomaly Detection Process in Scikit-Learn

AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Techniques for Anomaly Detection Process in Scikit-Learn

admin

February 21, 2026

Anomaly detection identifies data points that deviate from the norm, classifying anomalies as outliers in three categories: point anomalies are individual data instances deemed anomalous compared to other data, contextual anomalies occur when a data instance is anomalous within a specific context, and collective anomalies arise when related data instances deviate anomalously from the entire dataset.

Outlier detection and novelty detection are distinct methods used for anomaly detection. In the outlier detection, outliers in training data, defined as deviant observations, are ignored by outlier detection estimators that focus on concentrated regions of data, a process known as unsupervised anomaly detection. In contrast, the novelty detection detects unobserved patterns in new data not present in training data, focusing on clean datasets without outliers, termed semi-supervised anomaly detection.

The Scikit-Learn algorithms for outlier detection are described below.

Elliptic Envelope

This algorithm assumes regular data follows known Gaussian distribution. The object estimates robust covariance, fitting an ellipse to central data points while ignoring outliers. The algorithm is used in the following example.

import numpy as np
from sklearn.covariance import EllipticEnvelope

true_covariance = np.array([[.7, .1], [.2, .8]])
data = np.random.RandomState(0).multivariate_normal(mean = [0, 1], cov = true_covariance, size = 1000)
covariance = EllipticEnvelope(random_state = 0).fit(data)

# Returning 1 for an inlier and -1 for an outlier using the predict method
print(covariance.predict([[1, 1], [2, 2]]))

Output

[ 1 -1]

Isolation Forest

Random forests efficiently detect outliers in high-dimensional datasets. Scikit-Learn’s isolation forest isolates observations by randomly selecting a feature and a value between its maximum and minimum values. The number of splits to isolate a sample equals the path length from the root node to the terminal node. The example below shows how to use this method for fitting 50 trees to the given data.

import numpy as np
from sklearn.ensemble import IsolationForest

data = np.array([[0, 0], [-5, -6], [1, 5], [1, 2], [30, -40]])

classifier = IsolationForest(n_estimators = 50)
classifier.fit(data)

Local Outlier Factor

Local Outlier Factor (LOF) is an effective outlier detection algorithm for high-dimensional data. The method calculates a local outlier factor score to identify samples with significantly lower density compared to their neighbors, indicating anomalies. The following example uses this algorithm to construct the NeighborsClassifier class from an array related to the dataset.

from sklearn.neighbors import NearestNeighbors

samples = [[2., 2., 4.], [1., 0., .5], [0., 0., 0.]]

neighbors = NearestNeighbors(n_neighbors = 1, algorithm = "ball_tree", p = 1)
neighbors.fit(samples)
print(neighbors.kneighbors([[1., .4, .5]]))

Output

(array([[0.4]]), array([[1]]))

One-Class SVM

It is an unsupervised outlier detection algorithm, which estimates high-dimensional distribution support, requiring a kernel (commonly RBF) and a scalar parameter to define a frontier efficiently. The example below fits the dataset with the OneClassSVM object.

from sklearn.svm import OneClassSVM

data = [[0.65], [0.70], [0], [1], [0.45]]

classifier = OneClassSVM(gamma = 'scale').fit(data)
print(classifier.score_samples(data))

Output

[0.94045373 0.94436003 0.94127002 0.94127001 0.94066049]

References

Hackeling, G. (2017). Mastering Machine Learning with scikit-learn, 2nd Edition. Packt Publishing Ltd.
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. O’Reilly Media, Inc.
Tutorials Point. Scikit Learn Tutorial. Retrieved November 20, 2025, from https://www.tutorialspoint.com/.
DataCamp. Support Vector Machines with Scikit-learn Tutorial. Retrieved December 25, 2025, from https://www.datacamp.com/.

Tagged in :

admin

You May Love

AI, Artificial Intelligence, Digital Twin, Internet of Things, IoT, Machine Learning, ML

What is a Digital Twin?

April 4, 2026

.

admin

Introduction Internet of Things (IoT) development has led to technologies like Digital Twin, utilized across logistics, healthcare, automation, manufacturing, and asset…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Dimensionality Reduction Using PCA in Scikit-Learn

February 21, 2026

.

admin

Dimensionality reduction optimizes data samples by selecting principal features, with Principal Component Analysis (PCA) being a widely used algorithm for this…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn

February 21, 2026

.

admin

Scikit-Learn provides several key functions for evaluating the performance of clustering algorithms effectively and efficiently, as explained below. 1. Adjusted Rand…

AysamByte

Techniques for Anomaly Detection Process in Scikit-Learn

Elliptic Envelope

Isolation Forest

Local Outlier Factor

One-Class SVM

References

Leave a Reply Cancel reply

You May Love

What is a Digital Twin?

Dimensionality Reduction Using PCA in Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn