Today, let's dive deep into an essential concept in Data Science and Machine Learning - Receiver Operating Characteristic (ROC) curves, and Area Under the Curve (AUC) measurements!
ROC curves provide a comprehensive view of the performance of a binary classification model. Moreover, the AUC measurement gives a single number to help compare different models. Here, we will understand these concepts better and also implement them using Python!
In Machine Learning, ROC is a plot that illustrates the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The ideal point would be a false positive rate of zero and a true positive rate of one.
AUC stands for "Area under the ROC Curve". It tells how much the model is capable of distinguishing between classes. Higher AUC, the better the model!
Let's implement this on the iris dataset using Python's sklearn library.
First, we load the necessary libraries and the dataset.
import numpy as np from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn import metrics from sklearn.ensemble import RandomForestClassifier from sklearn.preprocessing import label_binarize # loading the iris dataset iris = datasets.load_iris() X = iris.data y = iris.target # Binarize the labels y = label_binarize(y, classes=[0, 1, 2]) n_classes = y.shape[1] # Splitting the dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state=0) # Training the Random Forest classifier classifier = RandomForestClassifier(max_depth=2, random_state=0) classifier.fit(X_train, y_train)
Now, we will compute the ROC curve and ROC area for each class.
# Predicting test data y_score = classifier.predict_proba(X_test) # Compute ROC curve and ROC area for each class fpr = dict() tpr = dict() roc_auc = dict() for i in range(n_classes): fpr[i], tpr[i], _ = metrics.roc_curve(y_test[:, i], y_score[:, i]) roc_auc[i] = metrics.auc(fpr[i], tpr[i]) # Print AUC scores for each class for i in range(n_classes): print("AUC for class", i, ":", roc_auc[i])
The above will give you the AUC for each class of the iris dataset.
Please note that the closer the AUC is to 1, the better the classifier is at separating the classes.
That's it! You have now successfully understood the concept of ROC and AUC, and also implemented them in Python. This will be a handy tool while working on classification problems in Machine Learning. Keep learning and practicing!