Image classification is one of the most important use cases of machine learning in the industry today. It is used to identify objects, facial features, colors, and so on, from a given image.
One of the most commonly used algorithms in image classification is Convolutional Neural Networks(CNN). They have the ability to extract complex patterns from the images and classify them accurately. While CNNs are widely used for image classification, they can be expensive in terms of computation.
Generative Adversarial Networks(GANs) have recently gained much attention in the field of image classification due to their superior capabilities. GANs are a combination of two neural networks —generator and discriminator— that work together to produce high-quality images.
The generator network takes in a noise vector and produces images of the same shape and size as training data images. The discriminator network takes the images generated by the generator and decides whether they belong to the actual training data or not.
Using GANs for image classification makes the process faster. The networks can be trained with smaller datasets and require fewer computations.
Moreover, GANs can be used to perform data augmentation and improve generalization of the classification task.
In this blog post, we'll discuss how to apply GANs in the task of image classification. We'll make use of the popular PyTorch
library for the implementation.
We'll use the MNIST
dataset to demonstrate the application of GANs in image classification.
Let's start by loading the dataset.
import torchvision import torch # load the dataset trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=torchvision.transforms.ToTensor())
Next, we need to define our GANs model for classification. We'll make use of the torch.nn
module for this purpose.
The discriminator and the generator networks will be neural networks with fully connected layers.
We'll define our Discriminator
class as follows:
import torch.nn as nn class Discriminator(nn.Module): def __init__(self, input_dim): super().__init__() self.fc1 = nn.Linear(input_dim, 128) self.fc2 = nn.Linear(128, 1) def forward(self, x): x = self.fc1(x) x = nn.LeakyReLU(0.2)(x) x = self.fc2(x) x = nn.Sigmoid()(x) return x
We'll define our Generator
class as follows:
class Generator(nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.fc1 = nn.Linear(input_dim, 128) self.fc2 = nn.Linear(128, output_dim) def forward(self, x): x = self.fc1(x) x = nn.LeakyReLU(0.2)(x) x = self.fc2(x) x = nn.Tanh()(x) return x
Here, input_dim
and output_dim
are the input dimensions and output dimensions of our datasets.
With our models defined, we can now move on to training them. We'll use the Adam
optimizer for this purpose.
# define the optimizer d_optimizer = torch.optim.Adam(discriminator.parameters(), lr=0.0003) g_optimizer = torch.optim.Adam(generator.parameters(), lr=0.0003)
Once the optimizers are configured, we can now define our training loop, where we'll perform the training process for both the discriminator and generator models.
for epochs in range(0, 10000): for idx, (images, labels) in enumerate(trainset): real_images = images.reshape(-1, 784) # train the discriminator d_optimizer.zero_grad() disc_real_output = discriminator(real_images) d_loss_real = torch.nn.BCELoss()(disc_real_output, torch.ones_like(disc_real_output)) d_loss_real.backward() # generate noise samples noise = torch.randn(real_images.shape[0], real_images.shape[1]) noise_images = generator(noise) disc_fake_output = discriminator(noise_images.detach()) d_loss_fake = torch.nn.BCELoss()(disc_fake_output, torch.zeros_like(disc_fake_output)) d_loss_fake.backward() d_optimizer.step() # train the generator g_optimizer.zero_grad() world_images = generator(noise) disc_world_output = discriminator(world_images) g_loss = torch.nn.BCELoss()(disc_world_output, torch.ones_like(disc_world_output)) g_loss.backward() g_optimizer.step()
We'll run the training loop for 10,000
epochs, until our model is well trained.
Finally, we can test our model to get the image classification accuracy.
# test the model correct = 0 total = 0 with torch.no_grad(): for images, labels in trainset: images = images.reshape(-1, 784) output = discriminator(images) predicted_label = 1 if output > 0.5 else 0 total += 1 correct += (predicted_label == labels).sum().item() # print the accuracy print('Accuracy : {} %'.format(correct / total * 100))
On running the code, we get an accuracy of 90.5%
on the MNIST
dataset. This shows that GANs can be effectively used in image classification tasks.
In this blog post, we discussed how to apply GANs in image classification tasks. We showed how GANs can be used to generate high-quality images, perform data augmentation, and improve the accuracy of classification tasks.
We implemented our GANs model in PyTorch
to train on the popular MNIST
dataset and achieved an accuracy of 90.5%
.