On each agent iteration the model produces a structured step with three sections. (1) Thought: a short natural-language reasoning note about what the model knows and what it plans next. (2) Action: a tool invocation in a fixed format (JSON, function call, or a special token like "Action: search[query]"). (3) Observation: the action's result (API return, document chunk, error) injected into context as a separate message or text block. The model sees the full (T,A,O,T,A,O,...) history on the next turn and either emits another TAO step or produces a "Final Answer". The loop is bounded by max_iterations, a token budget and the Final-Answer stop rule.
A bare LLM, even with function calling, often jumps straight to action, loses the thread or hallucinates a tool result before the real one arrives. TAO enforces a three-phase discipline: reason, act, observe — separating reasoning from execution, easing debugging and enabling multi-step problem solving with tools.
The model emits an Action and a fake Observation in the same turn, before the tool actually replies. Mitigation: parse the response, cut at Action, stop on a stop-token, and require role "tool" as the only source of Observation.
The agent repeats near-identical Actions and never progresses. Mitigation: action deduplication, a hard max_iterations cap, a watchdog detecting lack of new observations, optionally a Reflexion phase.
A tool result (a whole web page, a SQL result with hundreds of rows) fills the context window in 2–3 iterations. Mitigation: truncate Observation, summarise, retrieval-on-result, pagination.
The model puts final conclusions inside Thought instead of emitting "Final Answer", so the app never ends the loop. Mitigation: crisp rules in the system prompt, few-shot examples with a proper Final Answer, a parser enforcing the format.
Content returned from a tool (a page, email, SQL row) contains instructions trying to hijack the agent — the canonical attack vector against a TAO loop. Mitigation: isolate Observation, mark it untrusted, enforce tool-permission policies.
Yao et al. (Princeton + Google) formalise interleaving reasoning (Thought) and acting (Action) steps with environment feedback (Observation) — the foundation of every later agent loop.
LangChain ships the ReAct Agent and Agent Executor, making the "Thought / Action / Action Input / Observation" loop the default industry pattern for LLM agents.
Shinn et al. extend TAO with a reflection step over prior observations, showing that agents learn faster when they explicitly self-critique between iterations.
OpenAI Function Calling (June 2023) and follow-ups replace free-text "Action:" with structured JSON calls, while Observation becomes a dedicated "tool" role message — TAO's canonical implementation moves from prompt to API schema.
LangGraph (a state graph with Thought/Action/Observation nodes) and the Hugging Face Agents course (chapter "The Thought-Action-Observation Cycle") cement TAO as the canonical, vendor-neutral description of an agent loop.
Maximum number of full T-A-O loops before forcing a Final Answer.
How long and detailed Thought sections should be — trades cost against reasoning quality.
Cap on Observation length (e.g. first N tokens of a tool result) — protects the context window.
Loop stop rule: a "Final Answer" emission, no new Action for N steps, hitting max_iterations, exceeding the token budget.
Two competing implementation idioms exist: (a) free-form ReAct-style "Thought:/Action:/Observation:" embedded in a prompt, (b) structured tool-use with explicit "assistant" / "tool" roles in the chat API.
The model itself decides when to end a Thought and emit an Action, and when — after several iterations — to switch to Final Answer. Routing is implicit in the output-token space, steered by the system prompt and few-shot examples.
TAO is sequential by definition — Observation depends on Action, the next Thought depends on Observation. Parallelism only enters via extensions (parallel tool calls inside a single Action, multi-agent fan-out at a higher layer).
TAO is an agent-orchestration-level pattern — it runs anywhere a tool-use-capable LLM runs, regardless of the hardware backend.