Gradient Boosting Machines (GBM) are among the most popular machine learning algorithms used in practice. They are highly powerful and flexible, and can model complex nonlinear relationships.
In simple terms, Gradient Boosting works by combining weak prediction models to create a strong prediction model. In the case of regression problems, this is generally decision trees.
In this blog post, we are going to delve a little deeper into the mechanics of GBM and implement a simple version of the algorithm with Python.
The core principle behind Gradient Boosting Machines is to construct new base learners which can be maximally correlated with negative gradient of the loss function, associated with the whole ensemble.
Consider a decision tree as a base learner, we denote our prediction after iterations (or after trees have been added) as:
where is a new base learner that we'll add to our model.
Let's start by loading necessary libraries and creating synthetic dataset for our practice.
import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split # Generate a binary classification dataset. X, y = make_classification(n_samples=500, n_features=10, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Next we're implementing our GBM
from sklearn.tree import DecisionTreeRegressor from sklearn.metrics import mean_squared_error as mse # Initialize f_0(x) for all x in X F = np.zeros(y_train.shape) F_test = np.zeros(y_test.shape) learning_rate = 0.1 n_estimators = 100 # Iterate and add decision trees to F for i in range(n_estimators): # Compute pseudo residuals r = - (F - y_train) # Fit a decision tree to the residuals h = DecisionTreeRegressor(max_depth=2).fit(X_train, r) # Update F with the decision tree predictions F += learning_rate * h.predict(X_train) F_test += learning_rate * h.predict(X_test) # Compute MSE train_mse = mse(y_train, F) test_mse = mse(y_test, F_test) print(f"Step {i+1}, Train MSE: {train_mse:.4f}, Test MSE: {test_mse:.4f}")
In this code, we starts with zero prediction for each data point, then iteratively train decision tree learners on the residuals, and add these tree predictions to our model F.
In this short post, we explored the concept behind Gradient Boosting Machines and executed a simplified version of the algorithm via Python. Remember though, this is indeed a rudimentary approach - in practice, we would use optimized libraries like XGBoost or LightGBM which employ various mechanisms to train faster and prevent overfitting.