The Naive Bayes Classifier is a popular statistical technique used in Machine Learning and Artificial Intelligence for text classification problems. This model is particularly attractive due to its simplicity and effectiveness in handling high-dimensional feature vectors, making it a good option to consider when dealing with text data.
In this blog, we'll go ahead and look into how we can implement a simple Naïve Bayes Classifier from scratch with Python.
The Naive Bayes Classifier is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class. The class with the highest probability is considered as the most likely class.
We'll use GaussianNB from sklearn's naive_bayes module for our implementation. To illustrate how it works let's use the popular Iris dataset.
First, let's import the necessary libraries:
from sklearn import datasets from sklearn import metrics from sklearn.naive_bayes import GaussianNB from sklearn.model_selection import train_test_split
The next step is to load the dataset and split it into a training set and a test set:
iris = datasets.load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3,random_state=109)
Now, we can train the model and make predictions:
model = GaussianNB() model.fit(X_train, y_train) expected = y_test predicted = model.predict(X_test)
Finally, we can print the metrics to see how well our model performed:
print(metrics.classification_report(expected, predicted)) print(metrics.confusion_matrix(expected, predicted))
In this blog post, we have briefly explored the concept of Naïve Bayes Classifier and implemented it in Python using the GaussianNB from sklearn's naive_bayes module. Although being 'Naive', the Naïve Bayes often performs surprisingly well and is widely used because it often outperforms more sophisticated classification methods.
Always remember, the key to become proficient in machine learning algorithms is understanding the theory deeply and practicing different use-cases and datasets. Keep exploring!