Back to projects
AI SECURITY · Python · PyTorch

Adversarial Attack Workbench

Research workbench for generating and analysing adversarial examples against a ResNet50 brain tumor MRI classifier — probing model robustness with FGSM, PGD, and DeepFool attacks.

PyTorchPythonFGSMPGDDeepFool

Built a workbench to evaluate the adversarial robustness of a ResNet50 classifier trained to detect brain tumors across four classes (glioma, meningioma, pituitary, no tumor) against 1,600 test images.

Implemented three attack algorithms — FGSM (single-step L∞), PGD (iterative L∞), and DeepFool (minimum L2 perturbation) — alongside three search strategies: fixed epsilon grids, binary search for the minimum fooling ε, and a sweep mode that finds the first ε at which the model breaks.

Key findings: over 55% of images are misclassified at perturbations below the threshold of human perception; PGD reaches 99.4% fool rate by ε = 0.005; pituitary is the most fragile class while glioma is the most robust; and high softmax confidence is not a reliable indicator of robustness.

Identified a clinically dangerous confusion pattern — glioma being misclassified as notumor — and a dominant pituitary ↔ meningioma confusion pair, suggesting the model has conflated feature representations rather than learning fully separable decision boundaries.

Includes a gallery renderer that produces three-panel comparison images (original | adversarial | perturbation heatmap) and a grid of the most vulnerable examples in the test set.