Architecture

RecursiveMAS

2026ResearchPublished

Key innovation

Agents in a multi-agent system communicate via last-layer hidden states instead of generated text tokens, eliminating the decode/encode overhead between models.

How it works

The architecture consists of two variants of the RecursiveLink module — a lightweight two-layer network. The Inner RecursiveLink operates inside a single agent: instead of decoding text during intermediate reasoning, it maps the generated last-layer embeddings back into the same model's input space, creating a loop of "hidden thoughts". The Outer RecursiveLink bridges different agents: it aligns embeddings across models with different representation dimensions (e.g. Qwen ↔ Llama-3 ↔ Gemma3 ↔ Mistral). Foundation model weights are frozen — gradients train only RecursiveLink parameters, about 0.31% of the total. If two agents share the same foundation model in different roles, the GPU loads a single model copy and two RecursiveLink parameter sets.

Problem solved

Standard multi-agent systems (MAS) waste compute on text-based communication: each agent decodes its reasoning into tokens and the next agent re-encodes them into embeddings. This double translation increases latency, token consumption, and hampers end-to-end gradient training.

Key mechanisms

Communication via last-layer hidden states instead of generated text tokens

Recursive agent loop — the final agent's output feeds back to the first, opening another reasoning round

Inner RecursiveLink — intra-agent loop mapping embeddings back into the model's input space

Outer RecursiveLink — inter-agent bridge aligning embeddings across models with different dimensions

Freezing foundation model weights and training only lightweight RecursiveLink modules (~0.31% of total parameters)

Text decoding happens only once, at the end of the final recursion round

Sharing a single foundation model copy across agents playing different roles in the system

Strengths & limitations

Strengths

✓2.4x faster inference compared to text-based multi-agent systems

✓75.6% reduction in token usage at the third recursion round vs Recursive-TextMAS

✓Average 8.3% higher accuracy than the strongest baseline methods across 9 benchmarks

✓Training cost more than 2x lower than full fine-tuning — only ~13M parameters are updated

✓Operates heterogeneously — combines models from different families (Qwen, Llama-3, Gemma3, Mistral) in one system

✓GPU memory savings by sharing a single foundation model copy across multiple agent roles

✓Code and model weights released publicly under Apache 2.0 (GitHub, Hugging Face)

Limitations

✗Sequential chain nature — agents in one round must execute one after another (output→input), with no intra-round parallelism

✗Validation conducted mostly for 3–4 agents; behavior at larger scales remains an open research question

✗Effectiveness on very long contexts has not yet been measured

✗Experiments limited to open models (Qwen, Llama-3, Gemma3, Mistral) — no validation for closed models or MoE architectures

✗Hidden-state passing requires model version compatibility — updating one foundation model may require RecursiveLink retraining

✗Lack of interpretability of intermediate steps — reasoning happens in embedding space, not in human-readable text

✗End-to-end training still requires backpropagation through a chain of large models despite their frozen weights

Components

Inner RecursiveLinkIntra-agent hidden-thought loop

Lightweight two-layer network inside a single agent. Maps the model's last hidden states back into its input space, creating an intermediate reasoning loop without text generation.

Outer RecursiveLinkInter-agent embedding bridge

Lightweight two-layer bridge network connecting two adjacent agents. Aligns hidden states from one model to the input space of the other when models have different embedding dimensions.

Frozen foundation modelsReasoning agents

Pretrained LLMs (tested on Qwen, Llama-3, Gemma3, Mistral) acting as agents. Weights remain frozen during training — only RecursiveLink parameters are updated.

Official

Evolution

Original paper · 2026 · Preprint (UIUC, Stanford)

RecursiveMAS: Recursive Multi-Agent Systems with Hidden-State Communication

2026

RecursiveMAS publication

Inflection point

UIUC and Stanford researchers release the framework along with code and weights under Apache 2.0 (GitHub, Hugging Face).

Technical details

Hyperparameters (configurable axes)

Number of recursion roundsCritical

How many times the agent chain executes before producing the final answer. More rounds improve accuracy but scale compute linearly.

Number of agents in the chainHigh

Number of foundation models participating in a single recursion round. Validated up to 3–4 agents; scaling beyond remains an open research question.

RecursiveLink hidden dimensionMedium

Internal dimension of the two-layer RecursiveLink network. Affects total parameter count (~13M in the reference configuration).

Computational complexity

Computational characteristics

→Inference: 1.2x–2.4x faster than the text-based multi-agent equivalent (depending on configuration)

→Token usage: −34.6% at round one, −75.6% at round three vs Recursive-TextMAS (cumulative effect)

→Training: updates ~13M RecursiveLink parameters (~0.31% of the foundation models' total parameter count)

→Training cost more than 2x lower than full fine-tuning of the model chain

→GPU memory: a single foundation model copy serves multiple agent roles via separate RecursiveLink sets

→Accuracy: +8.3% on average against the strongest baselines across 9 benchmarks

→Validation scale: 3–4 agents in a single recursion chain

Benchmark notes

RecursiveMAS was tested on 9 benchmarks spanning math, hard sciences and medicine, code generation, and search-augmented question answering. Comparisons included standalone models with LoRA and full fine-tuning, alternative multi-agent frameworks (Mixture-of-Agents, TextGrad), and Recursive-TextMAS (the same recursive scheme but with text-based communication). The average lead over the strongest baselines was 8.3%. The largest margins appeared on reasoning-heavy tasks: +18.1% over TextGrad on AIME2025 and +13% on AIME2026.

Execution paradigm

Primary mode

conditional

Each recursion round activates the full agent chain; conditional mode refers to the number of rounds (state-dependent halting).

Activation pattern

stage_dependent

Parallelism

Parallelism level

sequential

Within a single recursion round, agents must be processed sequentially (one's output is the next one's input). Training of individual RecursiveLink modules can be parallel across batch rounds.

Scope

inferenceacross_devices

Hardware requirements

Primary

LLM inference dominates the cost; RecursiveLink adds only lightweight matrix operations on hidden representations.

Sources

How RecursiveMAS speeds up multi-agent inference by 2.4x and reduces token usage by 75%

article

VentureBeat

RecursiveMAS repository

Repository

GitHub

RecursiveMAS model weights

code

Hugging Face