AI Agent Security — Attacks, Jailbreaking, and Defense · How an AI Agent Attack Works — Mental Model and Threat Map
Direct attacker vs indirect attacker — fundamental difference
How an AI Agent Attack Works — Mental Model and Threat Map
Introduction
The distinction between direct and indirect attackers is one of the most important divisions in AI agent security. A direct attacker interacts with the model directly as a user — can use the user turn but is visible and subject to rate limiting. An indirect attacker never touches the agent interface — instead they control environmental data that the agent consumes (web pages, documents, API responses, emails). This lesson analyzes the mechanisms, vectors, effectiveness, and defense for each of these two attack archetypes.