Reflexion
Replaced neural weight updates with verbal reinforcement learning, where the agent verbally reflects on error signals and stores conclusions in episodic memory, enabling rapid adaptation without expensive fine-tuning.
In each trial, the agent performs a task and receives a feedback signal. A reflection module (the same LLM) analyzes the feedback signal and generates a verbal reflection describing errors and how to avoid them. The reflection is added to an episodic memory buffer. In the next trial, the memory buffer is appended to the agent's context, enabling it to reason based on previous experience.
Traditional reinforcement learning methods require a large number of trials and expensive model fine-tuning; LLM agents should be able to learn quickly from trial-and-error without parameter modification.
Common pitfalls
Context length limits episodic memoryHIGH
As reflections accumulate, the context window fills up, limiting the number of trials from which the agent can learn.
Reflections can reinforce wrong beliefsMEDIUM
If the agent generates incorrect reflections (misattributes failures), subsequent trials may be misguided.
Reference implementations
GENESIS · Source paper
Reflexion: Language Agents with Verbal Reinforcement LearningReAct - combining reasoning and acting
Yao et al. propose ReAct, interleaving reasoning traces with actions, precursor to Reflexion.
Reflexion (Shinn et al., NeurIPS 2023)
breakthroughShinn et al. introduce verbal reflection with episodic memory as a substitute for RL fine-tuning.
Reflexion integrated into agent frameworks
Reflexion-style reflection is adopted in LangChain, AutoGen, and other agent frameworks.
BUILT ON
ReAct
ReAct (Reasoning + Acting) is a prompting pattern introduced by Yao et al. (ICLR 2023) in which a large language model generates an interleaved sequence of three token types: Thought (a natural-language reasoning step), Action (a tool invocation — e.g. search, calculator, API), and Observation (the tool's returned result). The Thought→Action→Observation loop repeats until the model emits a Finish action that produces the final answer. ReAct resolves a fundamental limitation of pure Chain-of-Thought: CoT generates reasoning solely from model parameters and is prone to factual hallucination. Pure tool-using LLMs (e.g. Toolformer) execute actions but lack explicit planning. ReAct integrates both — reasoning guides action selection, and observations update the reasoning state in subsequent steps. In the original paper, Yao et al. evaluated ReAct on four task classes: HotpotQA and FEVER (multi-hop QA / fact-checking with Wikipedia access), ALFWorld (interactive tasks in a simulated text environment), and WebShop (online shopping). On HotpotQA, ReAct reaches 27.4% EM vs 22.4% for CoT — the key advantage being hallucination reduction through Wikipedia fact verification. On ALFWorld, ReAct achieves 71% success vs 25% for the best imitation-learning baseline. ReAct became the dominant pattern for LLM agents in 2023–2024 and is built into most agent frameworks: LangChain (AgentExecutor), LlamaIndex (ReActAgent), AutoGPT, BabyAGI, OpenAI Assistants API. Later extensions include Reflexion (Shinn et al. 2023 — self-criticism across episodes), Tree of Thoughts (Yao et al. 2023 — tree-structured search instead of linear trajectory), and native function calling in model APIs (OpenAI, Anthropic, Google), which internalizes the ReAct loop at the protocol level.
GO TO CONCEPT| Title | Publisher | Type |
|---|---|---|
| Reflexion: Language Agents with Verbal Reinforcement Learning | — | scientific article |