AI Agent Security — Attacks, Jailbreaking, and Defense · Jailbreaking — When and Why Safety Alignment Fails
Jailbreak vs Prompt Injection — Where Model Responsibility Ends, Where Application Responsibility Begins
Jailbreaking — When and Why Safety Alignment Fails
Introduction
Jailbreak and prompt injection are two different classes of attacks with different responsibility boundaries: jailbreak is a user's attack on the model's safety policy, prompt injection is the injection of instructions from external data processed by an agent. This lesson precisely defines this boundary, analyses direct and indirect prompt injection in agentic systems, operator vs model responsibility, and implications for AI pipeline design.