Agentic AIIntermediate

AI Agent Security — Attacks, Jailbreaking, and Defense

7 Chapters39 Lessons

The course covers three layers of agentic system security: (1) attack taxonomy — direct and indirect prompt injection, cheap jailbreaking techniques such as DAN and roleplay, model inversion, and data extraction; (2) defense mechanisms — input/output sanitization, instruction hierarchy enforcement, agent tool sandboxing, system prompt hardening, and AI firewall as an external filtering layer; (3) secure multi-agent system design — privilege isolation, least-privilege tool access, auditing, and real-time anomaly monitoring. Prerequisites: familiarity with LLM APIs (OpenAI/Anthropic), experience building at least one agentic system or production chatbot. The course does NOT cover: cryptography, network infrastructure security, compliance and regulations (GDPR/AI Act), or adversarial ML (attacks on model weights). Graduates can assess the attack surface of their own agentic system, apply established defense patterns, and make informed trade-offs between security and agent utility.

Chapters

MODULE 01

How an AI Agent Attack Works — Mental Model and Threat Map

0 / 5 · 0%

This chapter builds the fundamental mental model for attacking an AI agent: from LLM trust boundaries, through the anatomy of the attack surface (LLM, tools, memory, orchestrator), to the OWASP GenAI Top 10:2025 taxonomy and a practical threat modeling canvas.

MODULE 02

Prompt Injection — From Atomic Exploit to Multi-Stage Attack

0 / 6 · 0%

This chapter covers the anatomy of prompt injection attacks: from direct and indirect vectors, through invisible injection techniques, multi-stage C2 scenarios, persistent infections via agent memory, to a hands-on scenario of attacking an agent with tool calling.

MODULE 03

Jailbreaking — When and Why Safety Alignment Fails

0 / 5 · 0%

This chapter analyses the failure modes of safety alignment: from competing objectives and mismatched generalization, through a taxonomy of jailbreak techniques, many-shot gradient-free attacks, the distinction between jailbreaking and prompt injection, to application-side defences — self-reminder, instruction hierarchy, and Constitutional AI.

MODULE 04

System Prompt Security and Data Extraction

0 / 5 · 0%

Chapter on system prompt attacks and LLM data leakage: prompt extraction, multi-step manipulation, PII and API key disclosure, training data extraction, and multilayer system prompt hardening.

MODULE 05

Agent Security with Tools and MCP

0 / 6 · 0%

This chapter covers threats and protection mechanisms for tool-equipped AI agents: excessive agency (OWASP LLM06), MCP protocol security, privilege escalation in multi-agent systems, human-in-the-loop configuration, and practical audit trail design.

MODULE 06

Guardrails and AI Firewall — Multi-Layer Defense

0 / 6 · 0%

This chapter covers defense-in-depth architecture for AI systems: input/output filters, Llama Guard, NeMo Guardrails and Guardrails AI tools, the Dual LLM pattern, agent sandboxing, and dynamic adaptation of guardrails against evolving attacks.

MODULE 07

Red Teaming, Monitoring, and Secure Design of Agentic Systems

0 / 6 · 0%

This chapter covers the full offensive-defensive cycle: from planning and automating LLM red teaming, through building security evals in CI/CD and runtime monitoring, to a secure design checklist for agentic systems and a final security assessment scenario.