What is ReAct?
ReAct (Reasoning and Acting) is a prompting paradigm and agent architecture, not an AI model or a library. Strictly speaking, ReAct is not something you install — it is a way of orchestrating a large language model (LLM) so that, instead of producing a single long answer, the model alternates between thinking and acting on the outside world.
In typical LLM usage the user gets one answer generated in a single API call, drawn entirely from the model's training weights. ReAct decomposes that into a multi-turn sequence: the model produces a Thought (a plan, a hypothesis, or an admission of missing information), picks an Action from a defined toolkit (search, database, API, calculator, terminal), and the external environment returns an Observation that the model then reasons over. The loop repeats until the model decides it has enough information and returns a Final Answer.
The key technical property of ReAct: thoughts do not change the environment, actions do. Thoughts update the internal state of the model (its context); actions trigger external effects. That dual track is what separates ReAct from plain text generation.
Who is behind it?
ReAct was introduced in the paper "ReAct: Synergizing Reasoning and Acting in Language Models", posted as a preprint in October 2022 and presented at ICLR 2023. The authors are Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan and Yuan Cao — a team from Princeton University and Google Research.
Yao and colleagues argued that prior work treated reasoning (e.g. Chain-of-Thought) and acting (e.g. reinforcement-learning agents navigating environments) as separate research tracks. Their proposal interleaved the two tightly: reasoning lets the model plan and revise plans, while acting lets it reach for facts that are not in the weights. On HotpotQA?HotpotQA: A multi-hop open-domain question-answering dataset where the correct answer requires reasoning over two or more Wikipedia articles at the same time., FEVER?FEVER: Fact Extraction and VERification — a dataset for fact verification. A model classifies a claim as supported, refuted, or not enough information, based on Wikipedia articles., ALFWorld?ALFWorld: A text-based interactive decision-making environment built on the ALFRED benchmark, where an agent completes household tasks (e.g. placing a cooled tomato on a plate) by navigating a virtual home. and WebShop?WebShop: A simulated e-commerce environment where an agent must find and purchase a product matching specific requirements by navigating product listings and detail pages., ReAct beat both pure Chain-of-Thought and the imitation-learning and reinforcement-learning baselines — by up to 34 percentage points in success rate on some benchmarks.
How does it work?
The ReAct loop is built on three steps that iterate until a Final Answer is produced.
- Thought. The model writes an internal monologue — effectively a scratchpad — analysing the goal, summarising what it already knows, and deciding what is missing. The reasoning trace is visible in plain text, which makes the agent’s behaviour legible and debuggable.
- Action. Based on the thought, the model picks one tool from a predefined toolkit and supplies arguments. In the model output this looks like a structured command, for example
Action: Search[capital of France]. Crucially, the model does not generate the action’s result itself — that is the runtime’s job (LangChain, LangGraph, custom code), which intercepts the action, performs the real tool call, and returns the result. - Observation. The tool result is appended to the model’s context as the next chunk of text. The model “reads” the observation — be it a Wikipedia snippet, a numeric value, an HTTP error or a stack trace — and incorporates it into the next thought.
The loop closes when a new thought concludes that the data is sufficient. The model then produces Final Answer: instead of Action:, and the runtime stops iterating. In practice, frameworks impose a hard iteration cap (typically 6–10) to bound latency and avoid infinite loops.
How to read it: Every Action crosses the boundary into the runtime — it executes the tool and feeds the Observation back into the model’s context. The model never fabricates results: it alternates thinking and acting until it decides the data is enough, and instead of another action it produces a Final Answer. The iteration counter shows the hard cap guarding against an infinite loop.
What are its key components?
A working ReAct agent is usually a composition of five parts:
- LLM — the decision engine generating thoughts, actions and final answers.
- Toolkit — a set of callable functions: search engines, REST APIs, SQL queries, calculators, code interpreters, RAG retrievers.
- System prompt — the instruction enforcing the
Thought / Action / Action Input / Observation / Final Answerformat and listing the available tools. - Runtime / orchestrator — the code that reads the model's output, detects the
Action:directive, executes the tool, injects the observation into the context and re-invokes the LLM. The most common implementations are LangChain, LangGraph, CrewAI and Microsoft AutoGen. - Scratchpad — the growing transcript of past thoughts, actions and observations, concatenated into every subsequent prompt.
Formally, the original authors model the interaction as an extended Markov decision process: at every step t the model receives a context trajectory c_t = (x, r_1, a_1, o_1, ..., r_{t−1}, a_{t−1}, o_{t−1}) and samples from p(r_t, a_t | c_t) — choosing whether to emit a new thought or a new action.
What can it be used for?
ReAct shines wherever an LLM needs to fetch data outside its training distribution or act on the outside world. The four most common use cases in 2024–2025:
- Customer support and ticket triage. The agent reads a complaint, queries the CRM for purchase history, checks shipping status, and either triggers the refund API automatically or escalates to a human with a clean summary.
- Data analytics and financial reporting. The agent pulls market data via API, runs SQL queries, executes Python scripts for statistics, and synthesises a report — verifying intermediate numbers across iterations.
- Information retrieval in critical domains. In medicine and law, a ReAct agent grounds answers in current literature or case law instead of hallucinating, materially reducing the risk of stale or wrong recommendations.
- Software engineering agents. Tools like Cursor, Devin and Claude Code implement ReAct in many of their loops: the agent writes code (Action), runs tests (Action), reads the compiler error (Observation), patches the code (Thought + Action) and iterates until the build is green.
How does it differ from other approaches?
The most direct comparison is Chain-of-Thought (CoT). CoT is also a prompting paradigm, but it runs in a single model call and lives entirely inside the model's internal knowledge — the model emits a chain of reasoning steps, then the answer, with no contact with the world. If it adopts a wrong assumption early, the rest is confidently wrong.
ReAct keeps the "think step by step" advantage and adds an empirical-verification layer. CoT teaches the model how to think; ReAct additionally teaches it when to stop thinking and check reality. The price is higher latency (six or seven round-trips to the API instead of one) and heavier token consumption, since every iteration grows the scratchpad?Scratchpad: A working “scratchpad” inside the model’s context — the growing record of prior thoughts, actions and observations that the model appends to the prompt on each iteration to keep track of its reasoning..
CoT and ReAct are not mutually exclusive. The original Yao et al. paper already showed that the best results come from a hybrid approach: the model uses CoT for internal logic and switches to ReAct when it needs facts. Modern reasoning models (OpenAI o3, GPT-5 thinking, Claude Sonnet 4.5 with extended thinking) internalise CoT but still operate inside a ReAct-shaped loop whenever they have tool access.
ReAct is only one of several agent topologies.
- Plan-and-Execute — separates the planner role from the executors: an orchestrator drafts a full plan upfront, and dedicated sub-agents (often running simpler ReAct loops) carry out the steps.
- Tree-of-Thought — generates several alternative reasoning paths in parallel and picks the strongest.
- Reflexion — adds a self-critique layer after each ReAct run so the agent can learn from its own failures across executions.
ReAct is preferred for dynamic, exploratory tasks. Plan-and-Execute fits predictable pipelines.
Key limitations and challenges
ReAct ships with four recurring pitfalls every team must engineer around before going to production.
- Latency and cost. Every iteration is a separate LLM call and the context grows with the transcript. Six iterations mean six times the latency and often several times the bill of a single call.
- Infinite loops. If a tool keeps returning the same error or the same useless result, the model is prone to call it repeatedly with the same arguments. The standard mitigation is a hard iteration cap and a duplicate-call detector.
- Context overflow. Raw observations — a full HTML page injected into the prompt — can blow past the context window in two or three iterations. The convention is to pre-filter tool results through RAG, summarisation, or strict structured schemas.
- Safety. The agent's actions can be irreversible — deleting a database row, executing a payment, sending an email. Production deployments add human-in-the-loop checkpoints on destructive actions and tool sandboxing that constrains what the agent is even allowed to call.
Why does it matter?
ReAct is best read not as a single algorithm but as a structural decision about how AI systems are built. Yao et al. cemented the idea that an LLM should not be a closed oracle?Closed oracle: A metaphor for the model as an oracle — a black box that hands down a final answer from its own knowledge, without reaching for external data or checking it against the world. but one component of a control loop in which tools and observations from the world matter just as much.
The consequences are very practical:
- A shared vocabulary. ReAct fixed a shared vocabulary (Thought / Action / Observation / Final Answer) around which the entire agent-framework industry has aligned itself — LangChain, LangGraph, CrewAI, AutoGen, and in some form Anthropic Computer Use, the OpenAI Responses API with tools, or Google’s Agent Development Kit.
- An answer to hallucinations. Instead of training a better model, give the model a way to check itself externally.
- A new bottleneck. The bottleneck for agent scaling today is not model quality but the quality of tools, observation schemas, safety policies, and error handling.
For engineering teams, ReAct is the agent equivalent of TCP/IP: a concrete, not-particularly-elegant, but practical contract that simply works and is simple enough that more advanced layers — planning, multi-agent collaboration, reflection — can be built on top of it. Even if the name "ReAct" disappears from marketing decks in a year, the reason–act–observe loop will still be sitting underneath.
ReAct is not a silver bullet and does not replace model training or specialised rule systems. It is, however, the best available way to make a large language model — which by nature hallucinates and lacks current data — operate in environments where the cost of being wrong is measurable.
Sources
- arXiv — Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models — arxiv.org/abs/2210.03629
- Google Research — ReAct project page — research.google/pubs/react-synergizing-reasoning-and-acting-in-language-models/
- Prompt Engineering Guide — ReAct Prompting — promptingguide.ai/techniques/react
- IBM — What is a ReAct agent? — ibm.com/think/topics/react-agent
- LangChain — ReAct agents / LangGraph documentation — langchain.com
