Purpose and Types of Boosting Methods in Scikit-Learn

AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Purpose and Types of Boosting Methods in Scikit-Learn

admin

February 21, 2026

Boosting methods create an ensemble model incrementally by sequentially training base model estimators. They combine several weak learners trained over multiple iterations to build a powerful ensemble. Two main boosting methods in Scikit-Learn are included in this process: AdaBoost and Gradient Tree Boosting.

1. AdaBoost

This method is a successful boosting ensemble method that adjusts instance weights, allowing the algorithm to focus less on certain instances when building subsequent models. It can be used for both classification and regression.

1.1. Classification With AdaBoost

Scikit-Learn builds an AdaBoost classifier using the base_estimator parameter. If set to none, it defaults to DecisionTreeClassifier(max_depth=1) as the base estimator. The following example shows how to build an AdaBoost classifier and also predict and check its score.

from sklearn.datasets import make_classification
from sklearn.ensemble import AdaBoostClassifier

data, target = make_classification(n_samples = 2000, n_features = 10, n_informative = 2, n_redundant = 0, random_state = 0, shuffle = False)

classifier = AdaBoostClassifier(n_estimators = 200, random_state = 0)
classifier.fit(data, target)
print(classifier.score(data, target))
print(classifier.predict([[1, 1, 1, 0, 2, 3, 0, 1, 2, 2]]))

Output

0.9905
[1]

The example below also illustrates how to build this classifier based on the Pima-Indian dataset.

from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import cross_val_score

path = "pima-indians-diabetes.csv"
headers = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataset = read_csv(path, names = headers)
data = dataset.values[1:, 0:8]
target = dataset.values[1:, 8]

kfold = KFold(n_splits = 10)
classifier = AdaBoostClassifier(n_estimators = 150)
results = cross_val_score(classifier, data, target, cv = kfold)
print(results.mean())

Output

0.7643369788106631

The “pima-indians-diabetes.csv” dataset can be downloaded using the following link:

https://github.com/npradaschnor/Pima-Indians-Diabetes-Dataset/blob/master/diabetes.csv

1.2. Regression With AdaBoost

Scikit-Learn’s AdaBoost regressor employs parameters similar to its AdaBoost classifier for regression model creation. The following example shows how to construct this regressor and also predict new values using the predict() method.

from sklearn.datasets import make_regression
from sklearn.ensemble import AdaBoostRegressor

data, target = make_regression(n_features = 10, n_informative = 2, random_state = 0, shuffle = False)

regressor = AdaBoostRegressor(random_state = 0, n_estimators = 100)
regressor.fit(data, target)
print(regressor.predict([[0, 2, 1, 0, 1, 0, 1, 0, 2, 2]]))

Output

[75.8528769]

2. Gradient Tree Boosting

This method, also called Gradient Boosted Regression Trees (GRBT), generalizes boosting for arbitrary differentiable loss functions, creating an ensemble of weak prediction models. It effectively addresses regression and classification problems while handling mixed-type data efficiently.

2.1. Classification With Gradient Tree Boost

The Scikit-Learn library offers this classifier for building Gradient Tree Boost classifiers. The key parameter is ‘loss’, which can be set to ‘deviance’ for probabilistic classification. The n_estimators parameter determines the number of weak learners, while the learning_rate parameter, within (0.0, 1.0], mitigates overfitting through shrinkage. The following examples show how to construct this classifier, using the random (fitting the classifier with 100 learners) and Pima-Indian (fitting the classifier with 150 learners) datasets.

Example 1 with the random dataset:

from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier

data, target = make_hastie_10_2(random_state = 0)
data_train, data_test = data[:5000], data[5000:]
target_train, target_test = target[:5000], target[5000:]

classifier = GradientBoostingClassifier(n_estimators = 100, learning_rate = 1.0, max_depth = 1, random_state = 0).fit(data_train, target_train)
print(classifier.score(data_test, target_test))

Output

0.9171428571428571

Example 2 with the Pima-Indian dataset:

from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score

path = "pima-indians-diabetes.csv"
headers = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataset = read_csv(path, names = headers)
data = dataset.values[1:, 0:8]
target = dataset.values[1:, 8]

kfold = KFold(n_splits = 10, random_state = None)
classifier = GradientBoostingClassifier(n_estimators = 150, max_features = 5)
results = cross_val_score(classifier, data, target, cv = kfold)
print(results.mean())

Output

0.761637047163363

2.2. Regression With Gradient Tree Boost

This regressor enables gradient tree boosting with customizable loss functions, defaulting to least squares for regression. The example below illustrates how to build this regressor and also find the mean squared error.

from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

data, target = make_friedman1(n_samples = 5000, random_state = 0, noise = 1.0)
data_train, data_test = data[:4000], data[4000:]
target_train, target_test = target[:4000], target[4000:]

regressor = GradientBoostingRegressor(n_estimators = 100, learning_rate = 0.1, max_depth = 1, random_state = 0, loss = 'squared_error').fit(data_train, target_train)
print(mean_squared_error(target_test, regressor.predict(data_test)))

Output

5.036557893137746

References

Hackeling, G. (2017). Mastering Machine Learning with scikit-learn, 2nd Edition. Packt Publishing Ltd.
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. O’Reilly Media, Inc.
Tutorials Point. Scikit Learn Tutorial. Retrieved November 20, 2025, from https://www.tutorialspoint.com/.

Tagged in :

admin

You May Love

AI, Artificial Intelligence, Digital Twin, Internet of Things, IoT, Machine Learning, ML

What is a Digital Twin?

April 4, 2026

.

admin

Introduction Internet of Things (IoT) development has led to technologies like Digital Twin, utilized across logistics, healthcare, automation, manufacturing, and asset…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Dimensionality Reduction Using PCA in Scikit-Learn

February 21, 2026

.

admin

Dimensionality reduction optimizes data samples by selecting principal features, with Principal Component Analysis (PCA) being a widely used algorithm for this…
AI, Artificial Intelligence, Machine Learning, ML, Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn

February 21, 2026

.

admin

Scikit-Learn provides several key functions for evaluating the performance of clustering algorithms effectively and efficiently, as explained below. 1. Adjusted Rand…

AysamByte

Purpose and Types of Boosting Methods in Scikit-Learn

1. AdaBoost

1.1. Classification With AdaBoost

1.2. Regression With AdaBoost

2. Gradient Tree Boosting

2.1. Classification With Gradient Tree Boost

2.2. Regression With Gradient Tree Boost

References

Leave a Reply Cancel reply

You May Love

What is a Digital Twin?

Dimensionality Reduction Using PCA in Scikit-Learn

Evaluation of Clustering Performance in Scikit-Learn