AI Agent Security — Attacks, Jailbreaking, and Defense · Guardrails and AI Firewall — Multi-Layer Defense

Input validation and output sanitization: what works, what does not — why blocklists fail

Guardrails and AI Firewall — Multi-Layer Defense

Introduction

The simplest approach to protecting an LLM — a list of forbidden words (blocklist) — sounds reasonable but fails in predictably consistent ways in practice. This lesson analyses effective input validation techniques (intent detection, ML classifiers, structural context constraints) and output sanitization (PII redaction, format verification, extrapolation detection), and above all explains why blocklists are fundamentally inadequate in generative systems.