The harness surrounding a model defines: (1) an execution loop โ the strategy for successive steps (ReAct, plan-and-execute, reflection); (2) a tool interface โ JSON schemas, parameter validation, retry logic; (3) a sandbox โ isolated code execution environment (Docker/microVM); (4) memory โ session context, scratchpad, long-term memory; (5) a system prompt and guardrails; (6) observability โ traces, logs, evaluation. The infrastructure adds an inter-agent layer covering: (a) attribution of actions to a person or organization, (b) communication protocols (MCP for tools, A2A for agents), (c) reputation and oversight systems, (d) intervention mechanisms.
Agentic systems built directly on a raw LLM are uncontrollable, unsafe, and unmeasurable: no control loop, no execution isolation, no action attribution, no exchange protocols. Harness/Infrastructure provides a reusable engineering layer that gives agents observability, safety, identity, and the ability to collaborate.
Step strategy: ReAct, plan-and-execute, reflection, tree-of-thoughts.
JSON schemas, parameter validation, retries, mapping to MCP / function calling.
Isolated environment (Docker, microVM, Firecracker) for the agent's code and system operations.
Session context, scratchpad, long-term memory (vector store), and history compression.
Allowed tools, spend caps, escalation rules, risk classes, allowlists/blocklists.
Agent identity, action signing, mapping to a human/organizational principal (Chan et al. 2025).
Protocols such as A2A (Google) and MCP (Anthropic) โ standards for agent-to-agent and agent-to-tool exchanges.
Step tracing, tool logs, offline/online evaluation, replay and debugging.
Early frameworks overwhelm the model with layers of abstraction; Anthropic recommends starting with simple loops.
Running agent-generated code in the host process is a critical security vulnerability.
The absence of step, time, and spending limits leads to infinite loops and uncontrolled API costs.
Without identity and signing, it is impossible to trace who or what performed a given action โ blocking audit and enterprise adoption.
Naively concatenating history and tool results quickly overflows the context window; compression or summarization is required.
Tool output may contain malicious instructions; the harness must treat all outputs as untrusted data.
Without step-level tracing, debugging an agent is practically impossible.
Tight coupling to a specific SDK makes model or provider migration difficult.
ReAct vs plan-and-execute vs reflection vs tree-of-thoughts.
Number and granularity of tools exposed; competence-vs-hallucination trade-off.
None / process / container / microVM / hardware enclave.
Current context window only vs persistent long-term memory.
Permissive (dev) vs strict (prod) โ caps, allowlists, human escalation.
Anonymous agent vs named agent bound to a human/organizational principal.