Applying Adversarial Attacks On Machine Learning Models

In this blog post, we'll delve into the realm of adversarial attacks on machine learning models, particularly focusing on the Fast Gradient Sign Attack (FGSM).

Adversarial attacks are a type of cyber threat in which the attacker attempts to manipulate the functionality of an AI model through malicious input. Among various types of adversarial attacks, FGSM is one of the most straightforward methods. It makes slight alterations in the original input to mislead the machine learning model, making it output an incorrect prediction.

What is FGSM?

The Fast Gradient Sign Attack (FGSM) method is relatively simple yet effective. It leverages the gradients of the loss with respect to the input data to create an adversarial attack. It modifies the original image by adding a small disturbance along the direction of the gradient, leading to incorrect classification of the manipulated image.

Firstly, import necessary Python libraries and load the model.

import torch
import torchvision.models as models
import torchvision.transforms as transforms

# Load pre-trained ResNet model from torchvision
model = models.resnet50(pretrained=True)
model.eval()  # set model in the evaluation mode

Let's define the FGSM attack function.

def fgsm_attack(image, epsilon, data_grad):
    # Collect the element-wise sign of the data gradient
    sign_data_grad = data_grad.sign()
    # Create the perturbed image by adjusting each pixel of the input image
    perturbed_image = image + epsilon*sign_data_grad
    # Adding clipping to maintain [0,1] range
    perturbed_image = torch.clamp(perturbed_image, 0, 1)
    # Return the perturbed image
    return perturbed_image

Here, 'epsilon' determines the magnitude of the attack. Greater the 'epsilon', more noticeable would be the perturbations in the original image, leading to a higher chance of incorrect classification.

How to use the FGSM attack?

We're to define a function that applies the FGSM attack on the given image. We first compute the loss and the gradient of loss w.r.t the image, and then call the fgsm_attack() function.

def attack(model, loss, image, label, epsilon):
    image.requires_grad = True  # Set requires_grad attribute to True

    output = model(image)  # Forward pass the data through the model
    init_pred = output.max(1, keepdim=True)[1]  # get the index of the max log-probability

    if init_pred.item() != label.item():
        return image  # If the initial prediction is wrong, do not bother attacking

    loss_val = loss(output, label)  # Calculate the loss

    model.zero_grad()  # Zero all existing gradients
    loss_val.backward()  # Calculate gradients of model in backward pass

    data_grad = image.grad.data  # Collect datagrad

    perturbed_data = fgsm_attack(image, epsilon, data_grad)  # Call FGSM Attack
    return perturbed_data

Before this attack, your AI model might indeed perform very well, but always remember that machine learning models are still vulnerable to adversarial attacks. Therefore, considering security alongside accuracy during model development and deployment is crucial.