AI Agent Security — Attacks, Jailbreaking, and Defense · Prompt Injection — From Atomic Exploit to Multi-Stage Attack

Indirect prompt injection: when data is an instruction — RAG, documents, emails, web scrape

Prompt Injection — From Atomic Exploit to Multi-Stage Attack

Introduction

Indirect prompt injection (IPI) is an attack where the malicious instruction does not come from the user but is embedded in external data processed by the model — documents, emails, web pages, search results, RAG chunks. The model retrieves this data as "context" but effectively executes the attacker's hidden instruction. This lesson analyses the IPI mechanism in RAG systems, email/calendar agents, web-scraping pipelines and document analysis tools (Greshake et al. 2023, "Not What You've Signed Up For").