Decision Tree Regression is a type of supervised learning algorithm that is majorly used for solving regression problems. It works for both continuous as well as categorical output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on the most effective attributes or independent variables.
The Decision tree regression model uses the binary decision method to split the data. It divides the dataset based on the most effective attributes or independent variables. Decision trees use multiple algorithms to decide to split a node into two or more sub-nodes. The ID3 algorithm uses entropy and information gain to decide the split.
import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor
For this blog post, we will use the Auto MPG dataset from UCI Machine Learning Repository.
dataset = pd.read_csv('auto-mpg.csv') X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
regressor = DecisionTreeRegressor(random_state = 0) regressor.fit(X_train, y_train)
y_predicted = regressor.predict(X_test)
plt.scatter(X_test, y_test, color = 'red') plt.plot(X_train, regressor.predict(X_train), color = 'blue') plt.title('Auto MPG (Decision Tree Regression)') plt.xlabel('Attributes') plt.ylabel('MPG') plt.show()
And that is a basic implementation of Decision Tree Regression in Python! We've looked at the basic theory of Decision Tree regression along with its implementation. The sample code provided here is straightforward and should serve as a solid starting point for anyone looking to get into decision tree regression.
Thus, Decision Tree Regression not only helps in predictions but also helps in identifying the variables on which the output is highly dependent, which is not easily achievable in other regression models.