AI Agent Security — Attacks, Jailbreaking, and Defense · Prompt Injection — From Atomic Exploit to Multi-Stage Attack

SpAIware: persistent injection via agent memory (ChatGPT memories case)

Prompt Injection — From Atomic Exploit to Multi-Stage Attack

Introduction

SpAIware (term by Johann Rehberger, 2024) is a class of attacks in which a malicious instruction is PERSISTENTLY embedded in the system through the agent's memory mechanism — just as spyware infects an operating system, SpAIware infects the agent's context. The most prominent documented case is the ChatGPT Memories vulnerability (Rehberger 2024): IPI in a malicious document or web page wrote false instructions to the user's memory, affecting all future sessions. The lesson analyses the mechanism, vector, demonstration and mitigations of SpAIware.