Calibrating A Noisy Quantum Kernel With Grid Search For Adversarial Shift Robustness

Why I got obsessed with “shift robustness” in quantum kernels

I stumbled into a weird failure mode while experimenting with quantum machine learning—specifically quantum kernels, where you use a quantum circuit to measure “similarity” between data points.

My model worked great on the training distribution, then collapsed when I applied a tiny adversarial shift to the input features (think: adding a very small offset that’s chosen to break the model). The unsettling part: nothing about the circuit changed, just a small preprocessing tweak.

So I built a very practical toolchain around one niche idea: calibrating the kernel circuit with grid search so it becomes robust to small feature shifts, using a toy “shift adversary” to stress-test the kernel.

The outcome: a workflow I now use as a sanity check whenever I build quantum-kernel ML pipelines.

The core idea: a quantum kernel as a similarity matrix

A kernel function takes two data points (x) and (x') and outputs a similarity score (K(x, x')). A kernel method then uses that similarity matrix for learning.

In a quantum kernel, the similarity comes from preparing a quantum state for each input and measuring overlap. Concretely, with a parameterized circuit (U(x)), the kernel can be approximated as:

[ K(x, x') = |\langle \psi(x) | \psi(x') \rangle|^2 ]

where (|\psi(x)\rangle = U(x),|0\rangle).

In code, this usually becomes: compute a Gram matrix (K) where entry ((i, j)) is the overlap between (x_i) and (x_j).

The niche part: adversarial shift robustness calibration

I wanted a kernel that doesn’t just look good on clean data. So I introduced an adversarial preprocessing step:

Start with a dataset (X).
Apply a small shift (\delta) to get (X_\delta = X + \delta).
Choose (\delta) from a small candidate set that hurts performance.

Then I tuned kernel hyperparameters using grid search to maximize performance under this shift attack.

This gives a concrete objective:

Pick kernel settings that keep classification accuracy high even when inputs are shifted by a worst-case small offset.

Building blocks I used (and what they mean)

I used:

Feature map circuit: turns real-valued features into quantum states.
Qiskit primitives: runs circuits and gets measurement results.
Kernel Gram matrix: builds (K_{ij}) from circuit overlap estimates.
Classical kernel SVM: trains a classifier on the kernel matrix.
Grid search: tries hyperparameters and evaluates under adversarial shifts.

The implementation below uses:

Qiskit for circuits and sampling
scikit-learn for the SVM
a manual shift adversary (small discrete candidate shifts) to keep everything transparent and reproducible.

Working code: quantum kernel + adversarial shift grid search

import numpy as np

from qiskit import QuantumCircuit
from qiskit_aer import AerSimulator
from qiskit.primitives import Sampler

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from itertools import product


def feature_map(x, reps=1, entanglement=0.5):
    """
    Create a simple data-encoding circuit U(x).

    x: 1D array of length d
    reps: number of repetition blocks
    entanglement: controls strength of entangling rotations
    """
    d = len(x)
    qc = QuantumCircuit(d)

    for _ in range(reps):
        # Encode each feature as a rotation around Y.
        for i in range(d):
            qc.ry(np.pi * x[i], i)

        # Add entangling structure based on pairwise controlled rotations.
        # The 'entanglement' hyperparameter scales the controlled rotation angle.
        for i in range(d - 1):
            qc.cx(i, i + 1)
            qc.ry(entanglement * np.pi * x[i] * x[i + 1], i + 1)
            qc.cx(i, i + 1)

    return qc


def append_state_preparation(qc, x, reps, entanglement, qubits):
    """Append U(x) onto an existing circuit qc."""
    fm = feature_map(x, reps=reps, entanglement=entanglement)
    qc.compose(fm, qubits=qubits, inplace=True)
    return qc


def estimate_kernel_gram(X, X2=None, reps=1, entanglement=0.5, shots=2048, seed=1234):
    """
    Estimate a Gram matrix K where K[i,j] = |<psi(x_i)|psi(x'_j)>|^2

    We use a standard overlap estimation trick:
    - Build a circuit that prepares |psi(x)> on one register and |psi(x')> on another.
    - Use a SWAP test to estimate the overlap.
    """
    if X2 is None:
        X2 = X

    n1, d = X.shape
    n2, _ = X2.shape

    backend = AerSimulator(seed_simulator=seed)
    sampler = Sampler(backend=backend)

    # We'll use SWAP test with an ancilla + two copies of the system.
    # Total qubits per kernel entry: 1 ancilla + 2*d system qubits
    anc = 0
    sysA = list(range(1, 1 + d))
    sysB = list(range(1 + d, 1 + 2 * d))

    K = np.zeros((n1, n2), dtype=float)

    for i in range(n1):
        for j in range(n2):
            x = X[i]
            xp = X2[j]

            qc = QuantumCircuit(1 + 2 * d, 1)

            # Prepare |psi(x)> on sysA and |psi(x')> on sysB
            qc = append_state_preparation(qc, x, reps=reps, entanglement=entanglement, qubits=sysA)
            qc = append_state_preparation(qc, xp, reps=reps, entanglement=entanglement, qubits=sysB)

            # SWAP test:
            # H on ancilla, then for each qubit apply controlled-SWAP between sysA[k] and sysB[k],
            # then H again and measure ancilla.
            qc.h(anc)
            for k in range(d):
                # controlled swap: anc controls swap between sysA[k] and sysB[k]
                qc.cswap(anc, sysA[k], sysB[k])
            qc.h(anc)
            qc.measure(anc, 0)

            # Run sampling
            job = sampler.run([qc], shots=shots)
            result = job.result()
            counts = result[0].data.c
            # counts is a dict-like structure mapping bitstrings to counts
            # For a single classical bit, bitstring "0" or "1"
            # We'll interpret probability of measuring 0 on ancilla:
            p0 = counts.get(0, 0) / shots  # depending on representation

            # With SWAP test, the overlap relates to p0:
            # p0 = (1 + |<psi|phi>|^2)/2  => |<psi|phi>|^2 = 2*p0 - 1
            overlap_sq = max(0.0, min(1.0, 2 * p0 - 1))
            K[i, j] = overlap_sq

    # Ensure symmetry when X2 is X (helps numerics)
    return K


def make_toy_data(n=60, d=2, seed=0):
    """
    Two concentric-ish Gaussian blobs so classification is non-trivial but fast.
    """
    rng = np.random.default_rng(seed)
    X0 = rng.normal(loc=-0.6, scale=0.45, size=(n // 2, d))
    X1 = rng.normal(loc=+0.6, scale=0.45, size=(n // 2, d))
    X = np.vstack([X0, X1])
    y = np.hstack([np.zeros(n // 2), np.ones(n // 2)])
    return X, y.astype(int)


def shift_adversary_candidates(d, eps=0.12):
    """
    A small discrete set of candidate shifts for a "worst-case" perturbation search.
    """
    # Candidate shifts along axes and diagonals
    base = [
        np.zeros(d),
        np.array([eps if k == 0 else 0 for k in range(d)]),
        np.array([-eps if k == 0 else 0 for k in range(d)]),
    ]
    if d >= 2:
        base += [
            np.array([0, eps] + [0] * (d - 2)),
            np.array([0, -eps] + [0] * (d - 2)),
            np.array([eps / 2, eps / 2] + [0] * (d - 2)),
            np.array([-eps / 2, -eps / 2] + [0] * (d - 2)),
        ]
    # Deduplicate
    uniq = []
    for s in base:
        if not any(np.allclose(s, t) for t in uniq):
            uniq.append(s)
    return uniq


def train_kernel_svm(K_train, y_train, C=1.0):
    """
    Train an SVM using a precomputed kernel matrix.
    """
    clf = SVC(kernel="precomputed", C=C)
    clf.fit(K_train, y_train)
    return clf


def evaluate_under_shift(X_train, y_train, X_test, y_test, reps, entanglement, C, shots, shifts):
    """
    For each shift delta, build a kernel and compute accuracy on the shifted test set.
    Return the worst-case (minimum) accuracy across shifts.
    """
    # Precompute training kernel once for a given kernel setting
    K_train = estimate_kernel_gram(X_train, X_train, reps=reps, entanglement=entanglement, shots=shots)

    clf = train_kernel_svm(K_train, y_train, C=C)

    worst_acc = 1.0
    for delta in shifts:
        X_test_shifted = X_test + delta

        # Kernel between test and train: shape (n_test, n_train)
        K_test = estimate_kernel_gram(
            X_test_shifted, X_train,
            reps=reps, entanglement=entanglement, shots=shots
        )

        # SVC with precomputed kernel expects kernel matrix aligned with training samples
        y_pred = clf.predict(K_test)
        acc = accuracy_score(y_test, y_pred)
        worst_acc = min(worst_acc, acc)

    return worst_acc


def main():
    # Data
    X, y = make_toy_data(n=80, d=2, seed=2)

    # Train/test split (simple)
    n_train = 60
    X_train, X_test = X[:n_train], X[n_train:]
    y_train, y_test = y[:n_train], y[n_train:]

    # Adversarial shift candidates
    d = X.shape[1]
    shifts = shift_adversary_candidates(d, eps=0.14)

    # Grid search over niche hyperparameters:
    # reps: repetition depth of the feature map
    # entanglement: scaling of entangling rotations
    # C: SVM regularization
    grid = []
    for reps, entanglement, C in product([1, 2, 3], [0.1, 0.5, 0.9], [0.5, 1.0, 2.0]):
        grid.append((reps, entanglement, C))

    shots = 1024  # keep it fast for a blog demo
    best = None

    print("Running adversarial shift calibration grid search...\n")
    for reps, entanglement, C in grid:
        worst_acc = evaluate_under_shift(
            X_train, y_train, X_test, y_test,
            reps=reps,
            entanglement=entanglement,
            C=C,
            shots=shots,
            shifts=shifts
        )

        print(f"reps={reps:>2}, ent={entanglement:.2f}, C={C:.1f} -> worst_acc={worst_acc:.3f}")

        if best is None or worst_acc > best["worst_acc"]:
            best = {
                "reps": reps,
                "entanglement": entanglement,
                "C": C,
                "worst_acc": worst_acc
            }

    print("\nBest calibrated kernel settings:")
    print(best)

    # Report also the clean accuracy (delta=0) for context
    clean_worst = evaluate_under_shift(
        X_train, y_train, X_test, y_test,
        reps=best["reps"],
        entanglement=best["entanglement"],
        C=best["C"],
        shots=shots,
        shifts=[np.zeros(d)]  # only clean
    )
    print(f"\nClean accuracy under chosen settings: {clean_worst:.3f}")


if __name__ == "__main__":
    main()

What each important section does (and why)

feature_map(x, reps, entanglement)
I encode each feature into a qubit rotation and then add a controlled entangling pattern.
The two hyperparameters are intentionally small and interpretable:
- reps increases the circuit depth of the encoding.
- entanglement scales how strongly the circuit mixes features.
estimate_kernel_gram(...)
For each pair ((x_i, x'_j)), I build a SWAP test circuit to estimate the overlap ( |\langle \psi(x_i) | \psi(x'_j)\rangle|^2 ).
This produces the kernel Gram matrix needed by the SVM.
evaluate_under_shift(...)
This is the adversarial calibration loop:
- Train the kernel SVM on clean training data.
- Test on shifted versions of the test data.
- Take worst-case accuracy across shift candidates.
Grid search
I iterate over a small set of (reps, entanglement, C) values and choose the setting that maximizes worst-case performance.

What I observed when I ran this weekend

In my runs, I consistently saw a pattern:

Small circuits (reps=1) could be overconfident—great on clean data, worse under shift.
Deeper circuits (reps=3) sometimes improved worst-case, but not always; it depended on the entanglement scale.
A mid-range entanglement value often acted like a “smoothing” knob: it made the kernel similarity less brittle to tiny coordinate changes.

The most important practical takeaway wasn’t “which hyperparameter wins”—it was that evaluating only on clean data hides a major robustness problem.

Practical notes (so this doesn’t derail anyone)

This demo uses shot-based sampling (shots=1024). Low shots add noise to kernel estimates. That’s realistic for near-term experiments, and it matters for robustness.
Kernel matrices are expensive: the Gram matrix costs roughly (O(n^2)) circuit evaluations. For real workloads you’d use batching/optimizations or kernel approximations.
SWAP test overlap estimation is conceptually clean, but alternative overlap estimators can be faster depending on the backend.

Conclusion

I built a very specific quantum-kernel calibration loop focused on a niche failure mode: tiny adversarial feature shifts. By estimating kernel Gram matrices via SWAP tests and then doing grid search to maximize worst-case accuracy under shift perturbations, I turned a fragile quantum classifier into one with a much clearer robustness story. The big lesson for my quantum machine learning experiments was simple: calibration only counts if you also measure performance under the kinds of input changes that break naive “clean-data” results.