Dimensionality Reduction Using PCA in Scikit-Learn

AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Dimensionality Reduction Using PCA in Scikit-Learn

admin

February 21, 2026

Dimensionality reduction optimizes data samples by selecting principal features, with Principal Component Analysis (PCA) being a widely used algorithm for this process. PCA is provided in different functions, as described below.

Exact PCA

PCA reduces dimensionality linearly by applying Singular Value Decomposition (SVD) on centered input data. Scikit-Learn’s PCA module functions as a transformer, learning n components during fitting and enabling projection of new data onto these components afterward. The following utilizes this module to find the best five principal components of the Pima-Indian dataset.

from pandas import read_csv
from sklearn.decomposition import PCA

path = "pima-indians-diabetes.csv"
headers = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataset = read_csv(path, names = headers)
values = dataset.values
data = values[1:, 0:8]
target = values[1:, 8]

pca = PCA(n_components = 5)
fit = pca.fit(data)
print(("The explained variance is: %s\n") % (fit.explained_variance_ratio_))
print(fit.components_)

Output

The explained variance is: [0.88854663 0.06159078 0.02579012 0.01308614 0.00744094]

[[-2.02176587e-03  9.78115765e-02  1.60930503e-02  6.07566861e-02
   9.93110844e-01  1.40108085e-02  5.37167919e-04 -3.56474430e-03]
 [ 2.26488861e-02  9.72210040e-01  1.41909330e-01 -5.78614699e-02
  -9.46266913e-02  4.69729766e-02  8.16804621e-04  1.40168181e-01]
 [ 2.24649003e-02 -1.43428710e-01  9.22467192e-01  3.07013055e-01
  -2.09773019e-02  1.32444542e-01  6.39983017e-04  1.25454310e-01]
 [-4.90459604e-02  1.19830016e-01 -2.62742788e-01  8.84369380e-01
  -6.55503615e-02  1.92801728e-01  2.69908637e-03 -3.01024330e-01]
 [ 1.51612874e-01 -8.79407680e-02 -2.32165009e-01  2.59973487e-01
  -1.72312241e-04  2.14744823e-02  1.64080684e-03  9.20504903e-01]]

The “pima-indians-diabetes.csv” dataset can be downloaded using the following link:

https://github.com/npradaschnor/Pima-Indians-Diabetes-Dataset/blob/master/diabetes.csv

Incremental PCA

Incremental Principal Component Analysis (IPCA) addresses the limitation of PCA by allowing for out-of-core processing, enabling the handling of data that does not fit into memory. Similar to PCA, data is centered but not scaled across features before using SVD during decomposition. The example below show how to apply this module to the digits dataset.

from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA

data, _ = load_digits(return_X_y = True)

transformer = IncrementalPCA(n_components = 10, batch_size = 100)
transformer.partial_fit(data[:100, :])
transformed_data = transformer.fit_transform(data)
print(transformed_data.shape)

Output

(1797, 10)

Kernel PCA

Kernel PCA extends PCA for non-linear dimensionality reduction using kernels, supporting transform and inverse_transform in Scikit-Learn. The following example uses this algorithm in the digits dataset based on the sigmoid kernel.

from sklearn.datasets import load_digits
from sklearn.decomposition import KernelPCA

data, _ = load_digits(return_X_y = True)

transformer = KernelPCA(n_components = 10, kernel = 'sigmoid')
transformed_data = transformer.fit_transform(data)
print(transformed_data.shape)

Output

(1797, 10)

PCA Using Randomized SVD

PCA with randomized SVD reduces data dimensionality while preserving variance by eliminating lower singular value components for efficient processing. The example below applies this algorithm to find the best eight principal components in the Pima-Indian dataset.

from pandas import read_csv
from sklearn.decomposition import PCA

path = "pima-indians-diabetes.csv"
headers = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataset = read_csv(path, names = headers)
values = dataset.values
data = values[1:, 0:8]
target = values[1:, 8]

pca = PCA(n_components = 8, svd_solver = 'randomized')
fit = pca.fit(data)
print(("The explained variance is: %s\n") % (fit.explained_variance_ratio_))
print(fit.components_)

Output

The explained variance is: [8.88546635e-01 6.15907837e-02 2.57901189e-02 1.30861374e-02
 7.44093864e-03 3.02614919e-03 5.12444875e-04 6.79264301e-06]

[[-2.02176587e-03  9.78115765e-02  1.60930503e-02  6.07566861e-02
   9.93110844e-01  1.40108085e-02  5.37167919e-04 -3.56474430e-03]
 [ 2.26488861e-02  9.72210040e-01  1.41909330e-01 -5.78614699e-02
  -9.46266913e-02  4.69729766e-02  8.16804621e-04  1.40168181e-01]
 [ 2.24649003e-02 -1.43428710e-01  9.22467192e-01  3.07013055e-01
  -2.09773019e-02  1.32444542e-01  6.39983017e-04  1.25454310e-01]
 [-4.90459604e-02  1.19830016e-01 -2.62742788e-01  8.84369380e-01
  -6.55503615e-02  1.92801728e-01  2.69908637e-03 -3.01024330e-01]
 [ 1.51612874e-01 -8.79407680e-02 -2.32165009e-01  2.59973487e-01
  -1.72312241e-04  2.14744823e-02  1.64080684e-03  9.20504903e-01]
 [ 5.04730888e-03 -5.07391813e-02 -7.56365525e-02 -2.21363068e-01
   6.13326472e-03  9.70776708e-01  2.02903702e-03  1.51133239e-02]
 [ 9.86672995e-01  8.83426114e-04 -1.22975947e-03 -3.76444746e-04
   1.42307394e-03 -2.73046214e-03 -6.34402965e-03 -1.62555343e-01]
 [ 6.10123250e-03 -8.25459539e-04  5.20865450e-04 -2.54871909e-03
  -2.68965921e-04 -2.67341863e-03  9.99972146e-01 -1.95271966e-03]]

References

Hackeling, G. (2017). Mastering Machine Learning with scikit-learn, 2nd Edition. Packt Publishing Ltd.
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. O’Reilly Media, Inc.
Tutorials Point. Scikit Learn Tutorial. Retrieved November 20, 2025, from https://www.tutorialspoint.com/.

Tagged in :

admin

You May Love

AI, Artificial Intelligence, Digital Twin, Internet of Things, IoT, Machine Learning, ML

What is a Digital Twin?

April 4, 2026

.

admin

Introduction Internet of Things (IoT) development has led to technologies like Digital Twin, utilized across logistics, healthcare, automation, manufacturing, and asset…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Dimensionality Reduction Using PCA in Scikit-Learn

February 21, 2026

.

admin

Dimensionality reduction optimizes data samples by selecting principal features, with Principal Component Analysis (PCA) being a widely used algorithm for this…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn

February 21, 2026

.

admin

Scikit-Learn provides several key functions for evaluating the performance of clustering algorithms effectively and efficiently, as explained below. 1. Adjusted Rand…

AysamByte

Dimensionality Reduction Using PCA in Scikit-Learn

Exact PCA

Incremental PCA

Kernel PCA

PCA Using Randomized SVD

References

Leave a Reply Cancel reply

You May Love

What is a Digital Twin?

Dimensionality Reduction Using PCA in Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn