An Introduction To Gradient Boosting Machines In Python

Introduction

Gradient Boosting Machines (GBM) are among the most popular machine learning algorithms used in practice. They are highly powerful and flexible, and can model complex nonlinear relationships.

In simple terms, Gradient Boosting works by combining weak prediction models to create a strong prediction model. In the case of regression problems, this is generally decision trees.

In this blog post, we are going to delve a little deeper into the mechanics of GBM and implement a simple version of the algorithm with Python.

Understanding the Concept

The core principle behind Gradient Boosting Machines is to construct new base learners which can be maximally correlated with negative gradient of the loss function, associated with the whole ensemble.

Consider a decision tree as a base learner, we denote our prediction after $m$ iterations (or after $m$ trees have been added) as:

$F_{m} (x) = F_{m - 1} (x) + h_{m} (x)$

where $h_{m} (x)$ is a new base learner that we'll add to our model.

Python Code

Let's start by loading necessary libraries and creating synthetic dataset for our practice.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate a binary classification dataset.
X, y = make_classification(n_samples=500, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Next we're implementing our GBM

from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error as mse

# Initialize f_0(x) for all x in X
F = np.zeros(y_train.shape)
F_test = np.zeros(y_test.shape)

learning_rate = 0.1
n_estimators = 100

# Iterate and add decision trees to F
for i in range(n_estimators):
    # Compute pseudo residuals
    r = - (F - y_train)
    # Fit a decision tree to the residuals
    h = DecisionTreeRegressor(max_depth=2).fit(X_train, r)
    # Update F with the decision tree predictions
    F += learning_rate * h.predict(X_train)
    F_test += learning_rate * h.predict(X_test)
    # Compute MSE
    train_mse = mse(y_train, F)
    test_mse = mse(y_test, F_test)
    print(f"Step {i+1}, Train MSE: {train_mse:.4f}, Test MSE: {test_mse:.4f}")

In this code, we starts with zero prediction for each data point, then iteratively train decision tree learners on the residuals, and add these tree predictions to our model F.

Conclusion

In this short post, we explored the concept behind Gradient Boosting Machines and executed a simplified version of the algorithm via Python. Remember though, this is indeed a rudimentary approach - in practice, we would use optimized libraries like XGBoost or LightGBM which employ various mechanisms to train faster and prevent overfitting.