AI Agent Security — Attacks, Jailbreaking, and Defense · Jailbreaking — When and Why Safety Alignment Fails

Jailbreak vs Prompt Injection — Where Model Responsibility Ends, Where Application Responsibility Begins

Jailbreaking — When and Why Safety Alignment Fails

Introduction

Jailbreak and prompt injection are two different classes of attacks with different responsibility boundaries: jailbreak is a user's attack on the model's safety policy, prompt injection is the injection of instructions from external data processed by an agent. This lesson precisely defines this boundary, analyses direct and indirect prompt injection in agentic systems, operator vs model responsibility, and implications for AI pipeline design.