AI Agent Security — Attacks, Jailbreaking, and Defense · How an AI Agent Attack Works — Mental Model and Threat Map
Agent architecture as attack surface: LLM + tools + memory + orchestrator
How an AI Agent Attack Works — Mental Model and Threat Map
Introduction
An AI agent is not just a language model — it is a system of four interrelated components: LLM (decision core), tools (capability to act in the world), memory (short- and long-term context), and orchestrator (code coordinating the flow). Each component is a separate attack surface with its own threat class. This lesson systematically maps what can be attacked in each component, how components interact (attacking one may compromise others), and what the implications are for designing resilient agents.