Robots Atlas>ROBOTS ATLAS
Architecture

RecursiveMAS

2026ResearchPublished
Key innovation
Agents in a multi-agent system communicate via last-layer hidden states instead of generated text tokens, eliminating the decode/encode overhead between models.
Category
Architecture
Abstraction level
Pattern
Operation level
SystemInferenceTrainingAgent runtimeOrchestration
Use cases
Multi-agent math reasoning pipelinesMulti-agent code generation systemsMulti-agent QA with retrieval and verificationMedical agents with multi-step diagnosticsHeterogeneous LLM teams (different foundation models in one system)Token cost reduction in production multi-agent deployments

How it works

The architecture consists of two variants of the RecursiveLink module — a lightweight two-layer network. The Inner RecursiveLink operates inside a single agent: instead of decoding text during intermediate reasoning, it maps the generated last-layer embeddings back into the same model's input space, creating a loop of "hidden thoughts". The Outer RecursiveLink bridges different agents: it aligns embeddings across models with different representation dimensions (e.g. Qwen ↔ Llama-3 ↔ Gemma3 ↔ Mistral). Foundation model weights are frozen — gradients train only RecursiveLink parameters, about 0.31% of the total. If two agents share the same foundation model in different roles, the GPU loads a single model copy and two RecursiveLink parameter sets.

Problem solved

Standard multi-agent systems (MAS) waste compute on text-based communication: each agent decodes its reasoning into tokens and the next agent re-encodes them into embeddings. This double translation increases latency, token consumption, and hampers end-to-end gradient training.

Key mechanisms

Communication via last-layer hidden states instead of generated text tokens
Recursive agent loop — the final agent's output feeds back to the first, opening another reasoning round
Inner RecursiveLink — intra-agent loop mapping embeddings back into the model's input space
Outer RecursiveLink — inter-agent bridge aligning embeddings across models with different dimensions
Freezing foundation model weights and training only lightweight RecursiveLink modules (~0.31% of total parameters)
Text decoding happens only once, at the end of the final recursion round
Sharing a single foundation model copy across agents playing different roles in the system

Strengths & limitations

Strengths
2.4x faster inference compared to text-based multi-agent systems
75.6% reduction in token usage at the third recursion round vs Recursive-TextMAS
Average 8.3% higher accuracy than the strongest baseline methods across 9 benchmarks
Training cost more than 2x lower than full fine-tuning — only ~13M parameters are updated
Operates heterogeneously — combines models from different families (Qwen, Llama-3, Gemma3, Mistral) in one system
GPU memory savings by sharing a single foundation model copy across multiple agent roles
Code and model weights released publicly under Apache 2.0 (GitHub, Hugging Face)
Limitations
Sequential chain nature — agents in one round must execute one after another (output→input), with no intra-round parallelism
Validation conducted mostly for 3–4 agents; behavior at larger scales remains an open research question
Effectiveness on very long contexts has not yet been measured
Experiments limited to open models (Qwen, Llama-3, Gemma3, Mistral) — no validation for closed models or MoE architectures
Hidden-state passing requires model version compatibility — updating one foundation model may require RecursiveLink retraining
Lack of interpretability of intermediate steps — reasoning happens in embedding space, not in human-readable text
End-to-end training still requires backpropagation through a chain of large models despite their frozen weights

Components

Inner RecursiveLinkIntra-agent hidden-thought loop

Lightweight two-layer network inside a single agent. Maps the model's last hidden states back into its input space, creating an intermediate reasoning loop without text generation.

Outer RecursiveLinkInter-agent embedding bridge

Lightweight two-layer bridge network connecting two adjacent agents. Aligns hidden states from one model to the input space of the other when models have different embedding dimensions.

Frozen foundation modelsReasoning agents

Pretrained LLMs (tested on Qwen, Llama-3, Gemma3, Mistral) acting as agents. Weights remain frozen during training — only RecursiveLink parameters are updated.

Official

Evolution

Original paper · 2026 · Preprint (UIUC, Stanford)
RecursiveMAS: Recursive Multi-Agent Systems with Hidden-State Communication
2026
RecursiveMAS publication
Inflection point

UIUC and Stanford researchers release the framework along with code and weights under Apache 2.0 (GitHub, Hugging Face).

Technical details

Hyperparameters (configurable axes)

Number of recursion roundsCritical

How many times the agent chain executes before producing the final answer. More rounds improve accuracy but scale compute linearly.

Number of agents in the chainHigh

Number of foundation models participating in a single recursion round. Validated up to 3–4 agents; scaling beyond remains an open research question.

RecursiveLink hidden dimensionMedium

Internal dimension of the two-layer RecursiveLink network. Affects total parameter count (~13M in the reference configuration).

Computational complexity

Computational characteristics
Inference: 1.2x–2.4x faster than the text-based multi-agent equivalent (depending on configuration)
Token usage: −34.6% at round one, −75.6% at round three vs Recursive-TextMAS (cumulative effect)
Training: updates ~13M RecursiveLink parameters (~0.31% of the foundation models' total parameter count)
Training cost more than 2x lower than full fine-tuning of the model chain
GPU memory: a single foundation model copy serves multiple agent roles via separate RecursiveLink sets
Accuracy: +8.3% on average against the strongest baselines across 9 benchmarks
Validation scale: 3–4 agents in a single recursion chain
Benchmark notes

RecursiveMAS was tested on 9 benchmarks spanning math, hard sciences and medicine, code generation, and search-augmented question answering. Comparisons included standalone models with LoRA and full fine-tuning, alternative multi-agent frameworks (Mixture-of-Agents, TextGrad), and Recursive-TextMAS (the same recursive scheme but with text-based communication). The average lead over the strongest baselines was 8.3%. The largest margins appeared on reasoning-heavy tasks: +18.1% over TextGrad on AIME2025 and +13% on AIME2026.

Execution paradigm

Primary mode
conditional

Each recursion round activates the full agent chain; conditional mode refers to the number of rounds (state-dependent halting).

Activation pattern
stage_dependent

Parallelism

Parallelism level
sequential

Within a single recursion round, agents must be processed sequentially (one's output is the next one's input). Training of individual RecursiveLink modules can be parallel across batch rounds.

Scope
inferenceacross_devices

Hardware requirements

Primary

LLM inference dominates the cost; RecursiveLink adds only lightweight matrix operations on hidden representations.