Introduction To Decision Tree Regression With Python

Introduction to Decision Tree Regression

Decision Tree Regression is a type of supervised learning algorithm that is majorly used for solving regression problems. It works for both continuous as well as categorical output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on the most effective attributes or independent variables.

Advantages of Decision Tree Regression

  1. It can handle both categorical and numerical data.
  2. It is easy to understand and visualize.
  3. Requires less data preprocessing.

The Decision Tree Regression Process

The Decision tree regression model uses the binary decision method to split the data. It divides the dataset based on the most effective attributes or independent variables. Decision trees use multiple algorithms to decide to split a node into two or more sub-nodes. The ID3 algorithm uses entropy and information gain to decide the split.

Step 1: Import Necessary Libraries

import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor

Step 2: Load the Dataset

For this blog post, we will use the Auto MPG dataset from UCI Machine Learning Repository.

dataset = pd.read_csv('auto-mpg.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values

Step 3: Splitting the dataset into training and testing set.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Step 4: Fitting the Decision Tree Regression to our dataset

regressor = DecisionTreeRegressor(random_state = 0) regressor.fit(X_train, y_train)

Step 5: Predict the Test Set Result

y_predicted = regressor.predict(X_test)

Step 6: Visualize the Test Set Result

plt.scatter(X_test, y_test, color = 'red') plt.plot(X_train, regressor.predict(X_train), color = 'blue') plt.title('Auto MPG (Decision Tree Regression)') plt.xlabel('Attributes') plt.ylabel('MPG') plt.show()

And that is a basic implementation of Decision Tree Regression in Python! We've looked at the basic theory of Decision Tree regression along with its implementation. The sample code provided here is straightforward and should serve as a solid starting point for anyone looking to get into decision tree regression.

Summary

Thus, Decision Tree Regression not only helps in predictions but also helps in identifying the variables on which the output is highly dependent, which is not easily achievable in other regression models.