Types of K-Nearest Neighbors (KNN) Algorithms and Learning Techniques in Scikit-Learn

AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Types of K-Nearest Neighbors (KNN) Algorithms and Learning Techniques in Scikit-Learn

admin

February 21, 2026

Neighbor-based learning methods include supervised and unsupervised types. Supervised neighbor-based methods are primarily used for classification, though applicable to regression. These methods do not require a specialized training phase, utilize all available data for training, and do not make assumptions about underlying data, making them lazy and non-parametric. Nearest neighbor methods identify the nearest training samples to a new data point and predict the label based on them.

1. Types of Algorithms

Different types of algorithms can be utilized in neighbor-based methods, as follows:

Brute Force: Brute-force distance computation for neighbor search is effective for small datasets but becomes impractical as the number of samples increases significantly.
K-D Tree: K-D tree is a binary tree used to improve computational efficiency by recursively partitioning parameter space along data axes, creating nested orthographic regions filled with data points for better organization and retrieval.
Ball Tree: Ball tree data structure was developed to overcome K-D Tree’s inefficiency in higher dimensions. It recursively divides data into nodes defined by a centroid and radius, ensuring all points within a node lie inside the respective hyper-sphere.

2. Selecting Nearest Neighbors Algorithm

Selecting the ideal algorithm for a given dataset depends on the following factors:

Number of samples (N) and dimensionality (D): When selecting this algorithm, key factors include query time complexities: Brute Force has O[DN], K-D Tree varies with D, and Ball Tree has O[D log(N)], showing efficiency at D < 20 (O[D log(N)]) but inefficiency at D > 20, as the cost approaches O[DN]. This variability makes selecting the appropriate algorithm crucial based on dimensionality.
Data structure: The intrinsic dimensionality and sparsity of data significantly influence the performance of Ball Tree and K-D Tree algorithms, affecting their query times. In contrast, the Brute Force algorithm remains unaffected by data structure. Both Ball Tree and K-D Tree algorithms perform better with sparser data and lower dimensionality.
Number of neighbors (K): Increasing the number of neighbors for query points slows query times for Ball Tree and K-D Tree algorithms, while Brute Force’s query time remains unchanged regardless of this number.
Number of query points: K-D Tree and Ball Tree algorithms are effective for many query points, while Brute Force algorithm outperforms them when query points are fewer during the construction phase.

3. Types of KNN Learning Techniques

KNN is a non-parametric, lazy machine learning algorithm that does not assume a data distribution and uses the entire training dataset for testing without needing a separate model generation phase, through two following phases: (i) computing and storing K nearest neighbors for training samples, and (ii) retrieving K nearest neighbors from the dataset and predicting class by majority voting among those neighbors.

This algorithm implements K-nearest neighbors for both supervised and unsupervised learning methods. Unsupervised nearest neighbors use algorithms like Brute Force, K-D Tree, or Ball Tree to identify nearest neighbors for samples, forming the basis for algorithms like KNN and K-Means. In contrast, supervised neighbors-based learning is applied in classification and regression tasks.

3.1. Unsupervised KNN Learning

Scikit-Learn implements neighbor search as a separate learner due to inefficiencies in computing all pairwise distances for nearest neighbor searches in algorithms like KNN and K-Means. This allows for improved performance in unsupervised nearest neighbor learning. The following example demonstrates its implementation.

import numpy as np
from sklearn.neighbors import NearestNeighbors

data = np.array([[2, 3], [-1, 2], [1, -3], [-1, -2], [1, 5]])

nearest_neighbors = NearestNeighbors(n_neighbors = 3, algorithm = 'ball_tree')
nearest_neighbors.fit(data)
distances, indices = nearest_neighbors.kneighbors(data)
print(indices, "\n")
print(distances, "\n")
print(nearest_neighbors.kneighbors_graph(data).toarray())

Output

[[0 4 1]
 [1 0 4]
 [2 3 1]
 [3 2 1]
 [4 0 1]]

[[0.         2.23606798 3.16227766]
 [0.         3.16227766 3.60555128]
 [0.         2.23606798 5.38516481]
 [0.         2.23606798 4.        ]
 [0.         2.23606798 3.60555128]]

[[1. 1. 0. 0. 1.]
 [1. 1. 0. 0. 1.]
 [0. 1. 1. 1. 0.]
 [0. 1. 1. 1. 0.]
 [1. 1. 0. 0. 1.]]

3.2. Supervised KNN Learning

Supervised neighbors-based learning is applied for classification and regression tasks with respective labels. In the nearest neighbor classifier, neighbors-based classification relies on a majority vote from the nearest neighbors and stores training data instances, making it a type of non-generalizing learning. KNeighborsClassifier and RadiusNeighborsClassifier are two different types of the nearest neighbor classifiers applied in the Scikit-Learn library. In contrast, the nearest neighbor regressor is used for continuous data labels, computed based on the mean of labels from nearest neighbors. KNeighborsRegressor and RadiusNeighborsRegressor are two different types of the nearest neighbor regressors utilized in this library.

3.2.1. KNeighborsRegressor

The regressor uses the K nearest neighbors approach, where K is a user-specified integer. The chosen value of K varies with data, influencing the regressor’s learning process. The following implementation example clarifies this method based on the iris dataset.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
import numpy as np

iris = load_iris()
data = iris.data[:, :4]
target = iris.target

data_train, data_test, target_train, target_test = train_test_split(data, target, test_size = 0.30)
scaler = StandardScaler()
scaler.fit(data_train)
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)

regressor = KNeighborsRegressor(n_neighbors = 5)
regressor.fit(data_train, target_train)
print ("The mean squared error is:", format(np.power(target - regressor.predict(data), 4).mean()))

data = [[0], [1], [2], [3]]
target = [0, 1, 0, 1]

regressor = KNeighborsRegressor(n_neighbors = 3)
regressor.fit(data, target)
print(regressor.predict([[2.5]]))

Output

The mean squared error is: 5.593301333333334
[0.66666667]

3.2.2. RadiusNeighborsRegressor

The radius regressor finds the nearest neighbors within a user-defined radius R, implementing learning based on the number of neighbors in that fixed range around each training point. The following illustrative implementation example describes this method based on the iris dataset.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import RadiusNeighborsRegressor
import numpy as np

iris = load_iris()
data = iris.data[:, :4]
target = iris.target

data_train, data_test, target_train, target_test = train_test_split(data, target, test_size = 0.30)
scaler = StandardScaler()
scaler.fit(data_train)
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)

regressor = RadiusNeighborsRegressor(radius = 1)
regressor.fit(data_train, target_train)
print ("The mean squared error is:", format(np.power(target - regressor.predict(data), 4).mean()))

data = [[0], [1], [2], [3]]
target = [0, 1, 0, 1]

regressor = RadiusNeighborsRegressor(radius = 1)
regressor.fit(data, target)
print(regressor.predict([[1.5]]))

Output

The mean squared error is: 5.666666666666667
[0.5]

References

Hackeling, G. (2017). Mastering Machine Learning with scikit-learn, 2nd Edition. Packt Publishing Ltd.
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. O’Reilly Media, Inc.
Tutorials Point. Scikit Learn Tutorial. Retrieved November 20, 2025, from https://www.tutorialspoint.com/.

Tagged in :

admin

You May Love

AI, Artificial Intelligence, Digital Twin, Internet of Things, IoT, Machine Learning, ML

What is a Digital Twin?

April 4, 2026

.

admin

Introduction Internet of Things (IoT) development has led to technologies like Digital Twin, utilized across logistics, healthcare, automation, manufacturing, and asset…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Dimensionality Reduction Using PCA in Scikit-Learn

February 21, 2026

.

admin

Dimensionality reduction optimizes data samples by selecting principal features, with Principal Component Analysis (PCA) being a widely used algorithm for this…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn

February 21, 2026

.

admin

Scikit-Learn provides several key functions for evaluating the performance of clustering algorithms effectively and efficiently, as explained below. 1. Adjusted Rand…

AysamByte

Types of K-Nearest Neighbors (KNN) Algorithms and Learning Techniques in Scikit-Learn

1. Types of Algorithms

2. Selecting Nearest Neighbors Algorithm

3. Types of KNN Learning Techniques

3.1. Unsupervised KNN Learning

3.2. Supervised KNN Learning

3.2.1. KNeighborsRegressor

3.2.2. RadiusNeighborsRegressor

References

Leave a Reply Cancel reply

You May Love

What is a Digital Twin?

Dimensionality Reduction Using PCA in Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn