RecursiveMAS
How it works
The architecture consists of two variants of the RecursiveLink module — a lightweight two-layer network. The Inner RecursiveLink operates inside a single agent: instead of decoding text during intermediate reasoning, it maps the generated last-layer embeddings back into the same model's input space, creating a loop of "hidden thoughts". The Outer RecursiveLink bridges different agents: it aligns embeddings across models with different representation dimensions (e.g. Qwen ↔ Llama-3 ↔ Gemma3 ↔ Mistral). Foundation model weights are frozen — gradients train only RecursiveLink parameters, about 0.31% of the total. If two agents share the same foundation model in different roles, the GPU loads a single model copy and two RecursiveLink parameter sets.
Problem solved
Standard multi-agent systems (MAS) waste compute on text-based communication: each agent decodes its reasoning into tokens and the next agent re-encodes them into embeddings. This double translation increases latency, token consumption, and hampers end-to-end gradient training.
Key mechanisms
Strengths & limitations
Components
Lightweight two-layer network inside a single agent. Maps the model's last hidden states back into its input space, creating an intermediate reasoning loop without text generation.
Lightweight two-layer bridge network connecting two adjacent agents. Aligns hidden states from one model to the input space of the other when models have different embedding dimensions.
Pretrained LLMs (tested on Qwen, Llama-3, Gemma3, Mistral) acting as agents. Weights remain frozen during training — only RecursiveLink parameters are updated.
Official
Evolution
UIUC and Stanford researchers release the framework along with code and weights under Apache 2.0 (GitHub, Hugging Face).
Technical details
Hyperparameters (configurable axes)
How many times the agent chain executes before producing the final answer. More rounds improve accuracy but scale compute linearly.
Number of foundation models participating in a single recursion round. Validated up to 3–4 agents; scaling beyond remains an open research question.
Internal dimension of the two-layer RecursiveLink network. Affects total parameter count (~13M in the reference configuration).
Computational complexity
RecursiveMAS was tested on 9 benchmarks spanning math, hard sciences and medicine, code generation, and search-augmented question answering. Comparisons included standalone models with LoRA and full fine-tuning, alternative multi-agent frameworks (Mixture-of-Agents, TextGrad), and Recursive-TextMAS (the same recursive scheme but with text-based communication). The average lead over the strongest baselines was 8.3%. The largest margins appeared on reasoning-heavy tasks: +18.1% over TextGrad on AIME2025 and +13% on AIME2026.
Execution paradigm
Each recursion round activates the full agent chain; conditional mode refers to the number of rounds (state-dependent halting).
Parallelism
Within a single recursion round, agents must be processed sequentially (one's output is the next one's input). Training of individual RecursiveLink modules can be parallel across batch rounds.
Hardware requirements
LLM inference dominates the cost; RecursiveLink adds only lightweight matrix operations on hidden representations.