Dimensionality reduction optimizes data samples by selecting principal features, with Principal Component Analysis (PCA) being a widely used algorithm for this process. PCA is provided in different functions, as described below.
Exact PCA
PCA reduces dimensionality linearly by applying Singular Value Decomposition (SVD) on centered input data. Scikit-Learn’s PCA module functions as a transformer, learning n components during fitting and enabling projection of new data onto these components afterward. The following utilizes this module to find the best five principal components of the Pima-Indian dataset.
from pandas import read_csv
from sklearn.decomposition import PCA
path = "pima-indians-diabetes.csv"
headers = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataset = read_csv(path, names = headers)
values = dataset.values
data = values[1:, 0:8]
target = values[1:, 8]
pca = PCA(n_components = 5)
fit = pca.fit(data)
print(("The explained variance is: %s\n") % (fit.explained_variance_ratio_))
print(fit.components_)
Output
The explained variance is: [0.88854663 0.06159078 0.02579012 0.01308614 0.00744094]
[[-2.02176587e-03 9.78115765e-02 1.60930503e-02 6.07566861e-02
9.93110844e-01 1.40108085e-02 5.37167919e-04 -3.56474430e-03]
[ 2.26488861e-02 9.72210040e-01 1.41909330e-01 -5.78614699e-02
-9.46266913e-02 4.69729766e-02 8.16804621e-04 1.40168181e-01]
[ 2.24649003e-02 -1.43428710e-01 9.22467192e-01 3.07013055e-01
-2.09773019e-02 1.32444542e-01 6.39983017e-04 1.25454310e-01]
[-4.90459604e-02 1.19830016e-01 -2.62742788e-01 8.84369380e-01
-6.55503615e-02 1.92801728e-01 2.69908637e-03 -3.01024330e-01]
[ 1.51612874e-01 -8.79407680e-02 -2.32165009e-01 2.59973487e-01
-1.72312241e-04 2.14744823e-02 1.64080684e-03 9.20504903e-01]]
The “pima-indians-diabetes.csv” dataset can be downloaded using the following link:
https://github.com/npradaschnor/Pima-Indians-Diabetes-Dataset/blob/master/diabetes.csv
Incremental PCA
Incremental Principal Component Analysis (IPCA) addresses the limitation of PCA by allowing for out-of-core processing, enabling the handling of data that does not fit into memory. Similar to PCA, data is centered but not scaled across features before using SVD during decomposition. The example below show how to apply this module to the digits dataset.
from sklearn.datasets import load_digits
from sklearn.decomposition import IncrementalPCA
data, _ = load_digits(return_X_y = True)
transformer = IncrementalPCA(n_components = 10, batch_size = 100)
transformer.partial_fit(data[:100, :])
transformed_data = transformer.fit_transform(data)
print(transformed_data.shape)
Output
(1797, 10)
Kernel PCA
Kernel PCA extends PCA for non-linear dimensionality reduction using kernels, supporting transform and inverse_transform in Scikit-Learn. The following example uses this algorithm in the digits dataset based on the sigmoid kernel.
from sklearn.datasets import load_digits
from sklearn.decomposition import KernelPCA
data, _ = load_digits(return_X_y = True)
transformer = KernelPCA(n_components = 10, kernel = 'sigmoid')
transformed_data = transformer.fit_transform(data)
print(transformed_data.shape)
Output
(1797, 10)
PCA Using Randomized SVD
PCA with randomized SVD reduces data dimensionality while preserving variance by eliminating lower singular value components for efficient processing. The example below applies this algorithm to find the best eight principal components in the Pima-Indian dataset.
from pandas import read_csv
from sklearn.decomposition import PCA
path = "pima-indians-diabetes.csv"
headers = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataset = read_csv(path, names = headers)
values = dataset.values
data = values[1:, 0:8]
target = values[1:, 8]
pca = PCA(n_components = 8, svd_solver = 'randomized')
fit = pca.fit(data)
print(("The explained variance is: %s\n") % (fit.explained_variance_ratio_))
print(fit.components_)
Output
The explained variance is: [8.88546635e-01 6.15907837e-02 2.57901189e-02 1.30861374e-02
7.44093864e-03 3.02614919e-03 5.12444875e-04 6.79264301e-06]
[[-2.02176587e-03 9.78115765e-02 1.60930503e-02 6.07566861e-02
9.93110844e-01 1.40108085e-02 5.37167919e-04 -3.56474430e-03]
[ 2.26488861e-02 9.72210040e-01 1.41909330e-01 -5.78614699e-02
-9.46266913e-02 4.69729766e-02 8.16804621e-04 1.40168181e-01]
[ 2.24649003e-02 -1.43428710e-01 9.22467192e-01 3.07013055e-01
-2.09773019e-02 1.32444542e-01 6.39983017e-04 1.25454310e-01]
[-4.90459604e-02 1.19830016e-01 -2.62742788e-01 8.84369380e-01
-6.55503615e-02 1.92801728e-01 2.69908637e-03 -3.01024330e-01]
[ 1.51612874e-01 -8.79407680e-02 -2.32165009e-01 2.59973487e-01
-1.72312241e-04 2.14744823e-02 1.64080684e-03 9.20504903e-01]
[ 5.04730888e-03 -5.07391813e-02 -7.56365525e-02 -2.21363068e-01
6.13326472e-03 9.70776708e-01 2.02903702e-03 1.51133239e-02]
[ 9.86672995e-01 8.83426114e-04 -1.22975947e-03 -3.76444746e-04
1.42307394e-03 -2.73046214e-03 -6.34402965e-03 -1.62555343e-01]
[ 6.10123250e-03 -8.25459539e-04 5.20865450e-04 -2.54871909e-03
-2.68965921e-04 -2.67341863e-03 9.99972146e-01 -1.95271966e-03]]
References
- Hackeling, G. (2017). Mastering Machine Learning with scikit-learn, 2nd Edition. Packt Publishing Ltd.
- Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. O’Reilly Media, Inc.
- Tutorials Point. Scikit Learn Tutorial. Retrieved November 20, 2025, from https://www.tutorialspoint.com/.

Leave a Reply