Demystifying Naïve Bayes Classifier In Python

Introduction

The Naive Bayes Classifier is a popular statistical technique used in Machine Learning and Artificial Intelligence for text classification problems. This model is particularly attractive due to its simplicity and effectiveness in handling high-dimensional feature vectors, making it a good option to consider when dealing with text data.

In this blog, we'll go ahead and look into how we can implement a simple Naïve Bayes Classifier from scratch with Python.

What is Naïve Bayes Classifier?

The Naive Bayes Classifier is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class. The class with the highest probability is considered as the most likely class.

Python Implementation

We'll use GaussianNB from sklearn's naive_bayes module for our implementation. To illustrate how it works let's use the popular Iris dataset.

First, let's import the necessary libraries:

from sklearn import datasets
from sklearn import metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

The next step is to load the dataset and split it into a training set and a test set:

iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3,random_state=109)

Now, we can train the model and make predictions:

model = GaussianNB()
model.fit(X_train, y_train)

expected = y_test
predicted = model.predict(X_test)

Finally, we can print the metrics to see how well our model performed:

print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))

Summary

In this blog post, we have briefly explored the concept of Naïve Bayes Classifier and implemented it in Python using the GaussianNB from sklearn's naive_bayes module. Although being 'Naive', the Naïve Bayes often performs surprisingly well and is widely used because it often outperforms more sophisticated classification methods.

Always remember, the key to become proficient in machine learning algorithms is understanding the theory deeply and practicing different use-cases and datasets. Keep exploring!

References

Machine Learning, Tom M. Mitchell, McGraw Hill, 1997.
Python Machine Learning, Sebastian Raschka, Packt Publishing, 2015.