Demystifying Bayesian Optimization In Machine Learning

Every data scientist or machine learning enthusiast has had their fair share of time experimenting with hyperparameters to optimize model performance. Traditional approaches like Grid Search or Random Search prove inefficient when the search space is multi-dimensional, which is often the case. That's where Bayesian Optimization comes into play, providing a principled technique to guide the search for the optimal hyperparameters. In this post, we are going for a deep dive into Bayesian Optimization and how to implement it.

What is Bayesian Optimization?

Bayesian Optimization is a sequential design strategy for global optimization of black-box functions that doesn't require derivatives. Here, the 'black-box' function (also called an objective function) is the machine learning algorithm you want to optimize that has parameters (hyperparameters) that are not known in advance. Bayesian Optimization incorporates prior belief about the function and updates the belief with samples' information to get the maximum likelihood estimate.

Bayesian Optimization Steps

The Bayesian optimization process can be broken down into the following steps:

  1. Assume a prior over the objective black-box function
  2. Gather data of objective function using initial random points
  3. Update the prior to a posterior using obtained data points
  4. Choose next hyperparameter values where the expectation of improvement is maximized.
  5. Repeat steps 3 and 4 until convergence.

The Bayesian Optimization makes use of a "Gaussian Process (GP)" as a prior to predict the points in the search space that would possibly lead to the optimization of target function.

Python Implementation

For code implementation, we'll use the popular machine learning library Scikit-learn that provides a nice interface for Gaussian Processes and Bayesian Optimization implemented by the package Scikit-Optimize

# Import necessary libraries from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.svm import SVC from skopt.space import Real, Integer from skopt.utils import use_named_args from skopt import gp_minimize # Create a binary classification dataset X, y = make_classification(n_samples=500, n_features=20, n_informative=10, n_redundant=5, random_state=42) # Define the hyperparameter configuration space space_svc = [Integer(1, 5, name='max_iter'), Real(10**-5, 10**0, "log-uniform", name='C'), Real(10**-9, 10**0, "log-uniform", name='gamma')] @use_named_args(space_svc) def objective_svc(**params): classifier = SVC(**params) return -np.mean(cross_val_score(classifier, X, y, cv=5, scoring="accuracy")) # Run Bayesian Optimization bayes_opt_results = gp_minimize(objective_svc, space_svc, n_calls=30, random_state=0) # Print results best_accuracy = bayes_opt_results.fun best_parameters = bayes_opt_results.x print(f"Best Parameters: {best_parameters}\nBest Accuracy: {abs(best_accuracy)}")

This script creates a binary classification dataset, defines a hyperparameter space for a Support Vector Classifier, defines an objective function, and optimizes it with Bayesian Optimization using Gaussian Processes. The best hyperparameters and corresponding accuracy are then printed.

Summary

Bayesian Optimization is a powerful strategy for hyperparameter tuning, providing a more efficient and less manually intensive method compared to other traditional approaches. Today we learned about Bayesian Optimization and implemented it using Python, Sklearn, and Scikit-Optimize. It's a step forward into becoming a more efficient data scientist or machine learning engineer that no longer needs to do tedious tweaks of hyperparameters.