AI Agent Security — Attacks, Jailbreaking, and Defense · System Prompt Security and Data Extraction

System prompt hardening: what works, what does not — a multilayer approach to protection

System Prompt Security and Data Extraction

Introduction

Previous lessons showed why no single system prompt protection technique is sufficient. This lesson synthesizes the acquired knowledge into a practical hardening approach: from prompt-level techniques (what to write, what not to write, how to define model identity), through application controls (guardrails, output filtering, rate limiting), to architectural risk minimization principles (assume breach, least privilege, defense in depth). We also discuss what does NOT work to avoid security theater.