Reasoning

TAO Loop

2022ActivePublished: 5 June 2026Updated: 5 June 2026Published

Key innovation

Formalises an LLM agent's loop as three explicit phases — Thought (internal reasoning), Action (tool invocation) and Observation (the tool result fed back into context) — turning ad-hoc prompting into a predictable, debuggable control cycle.

How it works

On each agent iteration the model produces a structured step with three sections. (1) Thought: a short natural-language reasoning note about what the model knows and what it plans next. (2) Action: a tool invocation in a fixed format (JSON, function call, or a special token like "Action: search[query]"). (3) Observation: the action's result (API return, document chunk, error) injected into context as a separate message or text block. The model sees the full (T,A,O,T,A,O,...) history on the next turn and either emits another TAO step or produces a "Final Answer". The loop is bounded by max_iterations, a token budget and the Final-Answer stop rule.

Problem solved

A bare LLM, even with function calling, often jumps straight to action, loses the thread or hallucinates a tool result before the real one arrives. TAO enforces a three-phase discipline: reason, act, observe — separating reasoning from execution, easing debugging and enabling multi-step problem solving with tools.

Implementation

Reference implementations

ReAct Prompting (LangChain)

Python · LangChain

LangGraph Agent Loop

Python / TypeScript · LangChain

OpenAI Agents SDK

Python · OpenAI

Official

Hugging Face Agents Course — TAO Cycle

Docs / Python · Hugging Face

ReAct reference repo

Python · Shunyu Yao

Official

Implementation pitfalls

Hallucinated ObservationCritical

The model emits an Action and a fake Observation in the same turn, before the tool actually replies. Mitigation: parse the response, cut at Action, stop on a stop-token, and require role "tool" as the only source of Observation.

Fix:Stop-tokens after Action, a parser, and injecting Observation as a separate tool-role message.

No-progress loopsHigh

The agent repeats near-identical Actions and never progresses. Mitigation: action deduplication, a hard max_iterations cap, a watchdog detecting lack of new observations, optionally a Reflexion phase.

Fix:Deduplication, max_iterations, watchdog, Reflexion.

Context bloat from long ObservationsHigh

A tool result (a whole web page, a SQL result with hundreds of rows) fills the context window in 2–3 iterations. Mitigation: truncate Observation, summarise, retrieval-on-result, pagination.

Fix:Truncation, summarisation, pagination, retrieval-on-result.

Confusing Thought with Final AnswerMedium

The model puts final conclusions inside Thought instead of emitting "Final Answer", so the app never ends the loop. Mitigation: crisp rules in the system prompt, few-shot examples with a proper Final Answer, a parser enforcing the format.

Fix:A clear prompt, few-shot, a parser, fallback after max_iterations.

Prompt injection via Observation contentCritical

Content returned from a tool (a page, email, SQL row) contains instructions trying to hijack the agent — the canonical attack vector against a TAO loop. Mitigation: isolate Observation, mark it untrusted, enforce tool-permission policies.

Fix:Observation isolation, sanitisation, permission policies.

Evolution

Original paper · 2022 · ICLR 2023 / Princeton & Google Research · Shunyu Yao

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao

2022

ReAct defines the Thought–Action–Observation loop

Inflection point

Yao et al. (Princeton + Google) formalise interleaving reasoning (Thought) and acting (Action) steps with environment feedback (Observation) — the foundation of every later agent loop.

ReAct (concept)ReAct: Synergizing Reasoning and Acting in Language Models (paper)

2023

LangChain Agent Executor popularises TAO in production

LangChain ships the ReAct Agent and Agent Executor, making the "Thought / Action / Action Input / Observation" loop the default industry pattern for LLM agents.

2023

Reflexion adds self-critique to the loop

Shinn et al. extend TAO with a reflection step over prior observations, showing that agents learn faster when they explicitly self-critique between iterations.

Reflexion (concept)Reflexion: Language Agents with Verbal Reinforcement Learning (paper)

2023

Function calling moves Action into the native API

Inflection point

OpenAI Function Calling (June 2023) and follow-ups replace free-text "Action:" with structured JSON calls, while Observation becomes a dedicated "tool" role message — TAO's canonical implementation moves from prompt to API schema.

Function Calling (concept)

2024

LangGraph and the Hugging Face Agents course standardise the terminology

LangGraph (a state graph with Thought/Action/Observation nodes) and the Hugging Face Agents course (chapter "The Thought-Action-Observation Cycle") cement TAO as the canonical, vendor-neutral description of an agent loop.

TAO Loop

How it works

Problem solved

Implementation

Evolution

Hyperparameters (configurable axes)

Execution paradigm

Parallelism

Hardware requirements