Applying Adversarial Attacks On Machine Learning Models

In this blog post, we'll delve into the realm of adversarial attacks on machine learning models, particularly focusing on the Fast Gradient Sign Attack (FGSM).

Adversarial attacks are a type of cyber threat in which the attacker attempts to manipulate the functionality of an AI model through malicious input. Among various types of adversarial attacks, FGSM is one of the most straightforward methods. It makes slight alterations in the original input to mislead the machine learning model, making it output an incorrect prediction.

What is FGSM?

The Fast Gradient Sign Attack (FGSM) method is relatively simple yet effective. It leverages the gradients of the loss with respect to the input data to create an adversarial attack. It modifies the original image by adding a small disturbance along the direction of the gradient, leading to incorrect classification of the manipulated image.

Firstly, import necessary Python libraries and load the model.

import torch import torchvision.models as models import torchvision.transforms as transforms # Load pre-trained ResNet model from torchvision model = models.resnet50(pretrained=True) model.eval() # set model in the evaluation mode

Let's define the FGSM attack function.

def fgsm_attack(image, epsilon, data_grad): # Collect the element-wise sign of the data gradient sign_data_grad = data_grad.sign() # Create the perturbed image by adjusting each pixel of the input image perturbed_image = image + epsilon*sign_data_grad # Adding clipping to maintain [0,1] range perturbed_image = torch.clamp(perturbed_image, 0, 1) # Return the perturbed image return perturbed_image

Here, 'epsilon' determines the magnitude of the attack. Greater the 'epsilon', more noticeable would be the perturbations in the original image, leading to a higher chance of incorrect classification.

How to use the FGSM attack?

We're to define a function that applies the FGSM attack on the given image. We first compute the loss and the gradient of loss w.r.t the image, and then call the fgsm_attack() function.

def attack(model, loss, image, label, epsilon): image.requires_grad = True # Set requires_grad attribute to True output = model(image) # Forward pass the data through the model init_pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability if init_pred.item() != label.item(): return image # If the initial prediction is wrong, do not bother attacking loss_val = loss(output, label) # Calculate the loss model.zero_grad() # Zero all existing gradients loss_val.backward() # Calculate gradients of model in backward pass data_grad = image.grad.data # Collect datagrad perturbed_data = fgsm_attack(image, epsilon, data_grad) # Call FGSM Attack return perturbed_data

Before this attack, your AI model might indeed perform very well, but always remember that machine learning models are still vulnerable to adversarial attacks. Therefore, considering security alongside accuracy during model development and deployment is crucial.