AI Agent Security — Attacks, Jailbreaking, and Defense · Prompt Injection — From Atomic Exploit to Multi-Stage Attack
Multi-stage and deferred attacks: context pollution and C2 via LLM
Prompt Injection — From Atomic Exploit to Multi-Stage Attack
Introduction
Multi-stage prompt injection attacks are scenarios where a malicious instruction does not execute immediately but gradually infects the agent's context (context pollution) or establishes a command-and-control (C2) channel through the language model itself. The lesson analyses the architecture of such attacks: how an attacker can "plant" an instruction several steps before execution, how an LLM can be used as a C2 intermediary bypassing traditional network filters, and why multi-stage attacks drastically complicate detection and attribution.