AI Agent Security — Attacks, Jailbreaking, and Defense · System Prompt Security and Data Extraction
Sensitive information disclosure: PII, API keys, internal configs in LLM outputs
System Prompt Security and Data Extraction
Introduction
An LLM can disclose confidential data not only through system prompt extraction but also through responses to seemingly innocent questions — when the model has access to user data, knowledge bases, or tools. This lesson covers three main disclosure categories: PII (personal data of users), API keys and credentials, and internal system configurations. We analyze the mechanisms through which LLMs "leak" this data, and the real legal (GDPR, CCPA) and security consequences of such leaks.