A MAS system defines a set of agents, a communication environment, and a coordination protocol. Each agent is an autonomous unit with: (1) a cognitive core (typically an LLM with a system prompt defining its role), (2) short-term memory (context) and optionally long-term memory (vector store), (3) a tool-use interface (APIs, code execution, web search, retrieval), (4) a goal or task instruction. Task execution unfolds in stages: (a) decomposition โ an orchestrator or planner agent breaks the query into subtasks, (b) routing โ subtasks are assigned to agents based on roles and capabilities, (c) communication โ agents exchange messages (natural language, structured function calls, shared memory), (d) iteration โ agents execute actions, observe results, correct the plan (often with a built-in critic agent), (e) aggregation โ synthesis of results into a final answer. Coordination topologies: sequential pipeline (chain), hierarchical (orchestrator-worker), parallel fan-out/fan-in, blackboard (shared workspace), debate (agents compete/discuss), publish-subscribe, and decentralized peer-to-peer.
A single LLM faces hard limits on long-horizon tasks, tasks requiring diverse domain expertise, multi-step planning, or parallel processing of multiple information sources. A monolithic prompt rapidly loses coherence, context degrades (context rot), and the lack of specialized roles prevents separation of concerns (e.g., planner vs executor vs critic). MAS addresses this by decomposing the problem into subtasks assigned to autonomous agents with dedicated roles, tools, and memory, coordinated by a communication topology. This allows system complexity to scale without enlarging a single model, enables parallel solution exploration, and supports peer-review / critique mechanisms between agents.
Autonomous computational entity with its own internal state, perception, reasoning, and action capabilities. In LLM-based MAS: a language model with a system prompt defining the agent's role, goal, and constraints.
Official
Mechanism for information exchange between agents. Can take the form of direct message passing, shared memory, publish-subscribe systems, or event queues.
Official
Component responsible for workflow management: task decomposition, routing to agents, dependency management, error handling, and aggregating final results. Can be an LLM agent or a programmatic controller.
Official
State storage mechanisms: short-term memory (conversation context / LLM context window), long-term memory (vector database, external database), episodic memory (interaction history), and procedural memory (learned procedures).
Official
Layer integrating agents with external systems: APIs, search engines, code interpreters, databases, file services. Enables agents to act beyond their textual context.
Official
Incorrect output from one agent is passed as input to the next, leading to error accumulation across the pipeline. Without a validator agent, this can result in completely incorrect final outputs.
Each inter-agent communication consumes tokens (passing conversation history, context, tools). With many agents and long communication chains, costs can grow disproportionately fast.
Agents can enter loops โ e.g., one agent requests revision from another, which requests clarification back โ without a termination mechanism. Absence of stopping criteria causes infinite loops.
During parallel execution, agents may operate on inconsistent versions of shared state, leading to conflicts and race conditions, especially when shared memory is non-transactional.
Agents with poorly defined roles may duplicate work, conflict in decision-making, or skip tasks because each assumes another agent will handle it.
Wooldridge and Jennings publish 'Intelligent Agents: Theory and Practice' (Knowledge Engineering Review), defining agent properties (autonomy, reactivity, pro-activeness, social ability) and foundations of MAS theory.
Textbook standardizing MAS terminology and architecture, becoming the primary academic reference for the following decade.
2023 sees the first wave of LLM-based MAS frameworks: CAMEL (Li et al., March 2023), AutoGen (Wu et al., Microsoft, August 2023), MetaGPT (Hong et al., August 2023), transforming MAS from rule-based to language-model-based systems.
Anthropic announces Model Context Protocol (MCP), Google announces Agent-to-Agent Protocol (A2A). More mature frameworks emerge: LangGraph (LangChain), CrewAI, supporting complex agent topologies with state persistence and controlled flow.
MAS has no single canonical benchmark โ it is evaluated in the context of a specific application. Most-cited results: (1) MetaGPT on HumanEval and MBPP reaches 85.9% pass@1 (vs ~70% single-agent GPT-4 baseline from 2023). (2) AutoGen shows 15โ30% improvement over single-agent on math (MATH) and coding tasks via a critic agent. (3) Multi-Agent Debate (Du et al., 2023) improves accuracy on MMLU and GSM8K by 5โ20 pp over a single model at ~3ร token cost. (4) Generative Agents (Park et al., 2023) โ qualitatively evaluated by human raters in the Smallville environment; no numeric metrics. (5) ChatDev โ complete software projects delivered in minutes for a few dollars. Important caveats: MAS rarely beats the strongest single-model baseline (e.g., GPT-5) on simple single-shot queries; the advantage shows up on long-horizon and specialization-heavy tasks.
Number of agents in the system. Higher agent count can increase parallelism and specialization but complicates coordination and increases cost.
Communication and dependency pattern between agents: sequential chain, hierarchy (orchestrator-worker), peer-to-peer mesh, publish-subscribe, parallel fan-out/fan-in, or hybrid DAG.
Type and scope of memory allocated to each agent and the system as a whole: context-only (within LLM context window), external long-term (vector database), shared across agents, or private per agent.
Mode and entry points for human participation: none (full autonomy), approval at every step, approval at key decisions, or intervention only on errors.
Degree to which individual agents have specialized roles (e.g., search agent, coding agent, verification agent) versus general-purpose agents.
Not all agents are active for every task โ the set of active agents depends on the nature of the task and the orchestrator's dynamic routing.
Orchestrator or routing mechanism decides which agent should handle a given subtask, based on agent role, capabilities, current system state, or results from previous agents.
MAS systems can be fully parallel (agents operating asynchronously on independent subtasks), partially sequential (with synchronization barriers), or hybrid, depending on the task topology and its dependency graph.
MAS is an architectural pattern operating above the hardware layer. Individual agents may run on GPUs (LLM inference), CPUs (orchestration logic), or in the cloud, depending on the implementation.