AI Agent Security — Attacks, Jailbreaking, and Defense · How an AI Agent Attack Works — Mental Model and Threat Map

Threat modeling canvas for an agent with tools — practical exercise

How an AI Agent Attack Works — Mental Model and Threat Map

Introduction

Threat modeling is a systematic process of identifying, assessing, and prioritizing threats in a system architecture. For an AI agent with tools, such a "canvas" (analysis framework) covers four dimensions: Assets (what to protect), Threats (how it can be attacked), Vulnerabilities (what weaknesses our agent has), Controls (what defense mechanisms to implement). This lesson teaches practical threat modeling using STRIDE and PASTA methods in the AI agent context, mapping threats to components, and threat prioritization through risk scoring.