AI Agent Security โ Attacks, Jailbreaking, and Defense ยท How an AI Agent Attack Works โ Mental Model and Threat Map
Threat modeling canvas for an agent with tools โ practical exercise
How an AI Agent Attack Works โ Mental Model and Threat Map
Introduction
Threat modeling is a systematic process of identifying, assessing, and prioritizing threats in a system architecture. For an AI agent with tools, such a "canvas" (analysis framework) covers four dimensions: Assets (what to protect), Threats (how it can be attacked), Vulnerabilities (what weaknesses our agent has), Controls (what defense mechanisms to implement). This lesson teaches practical threat modeling using STRIDE and PASTA methods in the AI agent context, mapping threats to components, and threat prioritization through risk scoring.