Boosting methods create an ensemble model incrementally by sequentially training base model estimators. They combine several weak learners trained over multiple iterations to build a powerful ensemble. Two main boosting methods in Scikit-Learn are included in this process: AdaBoost and Gradient Tree Boosting.
1. AdaBoost
This method is a successful boosting ensemble method that adjusts instance weights, allowing the algorithm to focus less on certain instances when building subsequent models. It can be used for both classification and regression.
1.1. Classification With AdaBoost
Scikit-Learn builds an AdaBoost classifier using the base_estimator parameter. If set to none, it defaults to DecisionTreeClassifier(max_depth=1) as the base estimator. The following example shows how to build an AdaBoost classifier and also predict and check its score.
from sklearn.datasets import make_classification
from sklearn.ensemble import AdaBoostClassifier
data, target = make_classification(n_samples = 2000, n_features = 10, n_informative = 2, n_redundant = 0, random_state = 0, shuffle = False)
classifier = AdaBoostClassifier(n_estimators = 200, random_state = 0)
classifier.fit(data, target)
print(classifier.score(data, target))
print(classifier.predict([[1, 1, 1, 0, 2, 3, 0, 1, 2, 2]]))
Output
0.9905
[1]
The example below also illustrates how to build this classifier based on the Pima-Indian dataset.
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import cross_val_score
path = "pima-indians-diabetes.csv"
headers = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataset = read_csv(path, names = headers)
data = dataset.values[1:, 0:8]
target = dataset.values[1:, 8]
kfold = KFold(n_splits = 10)
classifier = AdaBoostClassifier(n_estimators = 150)
results = cross_val_score(classifier, data, target, cv = kfold)
print(results.mean())
Output
0.7643369788106631
The “pima-indians-diabetes.csv” dataset can be downloaded using the following link:
https://github.com/npradaschnor/Pima-Indians-Diabetes-Dataset/blob/master/diabetes.csv
1.2. Regression With AdaBoost
Scikit-Learn’s AdaBoost regressor employs parameters similar to its AdaBoost classifier for regression model creation. The following example shows how to construct this regressor and also predict new values using the predict() method.
from sklearn.datasets import make_regression
from sklearn.ensemble import AdaBoostRegressor
data, target = make_regression(n_features = 10, n_informative = 2, random_state = 0, shuffle = False)
regressor = AdaBoostRegressor(random_state = 0, n_estimators = 100)
regressor.fit(data, target)
print(regressor.predict([[0, 2, 1, 0, 1, 0, 1, 0, 2, 2]]))
Output
[75.8528769]
2. Gradient Tree Boosting
This method, also called Gradient Boosted Regression Trees (GRBT), generalizes boosting for arbitrary differentiable loss functions, creating an ensemble of weak prediction models. It effectively addresses regression and classification problems while handling mixed-type data efficiently.
2.1. Classification With Gradient Tree Boost
The Scikit-Learn library offers this classifier for building Gradient Tree Boost classifiers. The key parameter is ‘loss’, which can be set to ‘deviance’ for probabilistic classification. The n_estimators parameter determines the number of weak learners, while the learning_rate parameter, within (0.0, 1.0], mitigates overfitting through shrinkage. The following examples show how to construct this classifier, using the random (fitting the classifier with 100 learners) and Pima-Indian (fitting the classifier with 150 learners) datasets.
Example 1 with the random dataset:
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
data, target = make_hastie_10_2(random_state = 0)
data_train, data_test = data[:5000], data[5000:]
target_train, target_test = target[:5000], target[5000:]
classifier = GradientBoostingClassifier(n_estimators = 100, learning_rate = 1.0, max_depth = 1, random_state = 0).fit(data_train, target_train)
print(classifier.score(data_test, target_test))
Output
0.9171428571428571
Example 2 with the Pima-Indian dataset:
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score
path = "pima-indians-diabetes.csv"
headers = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataset = read_csv(path, names = headers)
data = dataset.values[1:, 0:8]
target = dataset.values[1:, 8]
kfold = KFold(n_splits = 10, random_state = None)
classifier = GradientBoostingClassifier(n_estimators = 150, max_features = 5)
results = cross_val_score(classifier, data, target, cv = kfold)
print(results.mean())
Output
0.761637047163363
2.2. Regression With Gradient Tree Boost
This regressor enables gradient tree boosting with customizable loss functions, defaulting to least squares for regression. The example below illustrates how to build this regressor and also find the mean squared error.
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
data, target = make_friedman1(n_samples = 5000, random_state = 0, noise = 1.0)
data_train, data_test = data[:4000], data[4000:]
target_train, target_test = target[:4000], target[4000:]
regressor = GradientBoostingRegressor(n_estimators = 100, learning_rate = 0.1, max_depth = 1, random_state = 0, loss = 'squared_error').fit(data_train, target_train)
print(mean_squared_error(target_test, regressor.predict(data_test)))
Output
5.036557893137746
References
- Hackeling, G. (2017). Mastering Machine Learning with scikit-learn, 2nd Edition. Packt Publishing Ltd.
- Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. O’Reilly Media, Inc.
- Tutorials Point. Scikit Learn Tutorial. Retrieved November 20, 2025, from https://www.tutorialspoint.com/.

Leave a Reply