Sakana Fugu: multi-model orchestration that matches frontier AI performance

On June 22, 2026, Sakana AI launched Fugu — a multi-agent orchestration system accessible through a single OpenAI-compatible API endpoint. Rather than relying on one base model, Fugu delegates subtasks to a swappable pool of specialized agents and synthesizes the results. On LiveCodeBench, Fugu Ultra scored 93.2, outperforming the now-unavailable Claude Fable 5 at 89.8.

Key takeaways

Fugu Ultra scored 93.2 on LiveCodeBench — ahead of Claude Fable 5 (89.8) and exceeding Claude Mythos Preview on GPQA-D (Fugu Ultra: 95.5 vs. Mythos: 94.6).
The system operates through a single OpenAI-compatible API, hiding all orchestration complexity from the developer.
Fugu Ultra pricing: $5 per million input tokens, $30 per million output — matching GPT-5.5 but $25 cheaper than Claude Fable 5 ($60 combined).
The product is unavailable in the European Union and EEA pending GDPR compliance resolution.
Sakana AI was founded in 2023 by Llion Jones (co-author of "Attention Is All You Need") and David Ha, former head of research at Stability AI.

What Fugu is and how it works

Fugu is not a model router — it is an orchestrator. Standard routing systems (Not Diamond, Martian, RouteLLM) analyze a query and forward it to one best-fit model. Fugu breaks a task into subtasks, delegates them in parallel or in sequence to multiple models in a managed pool, verifies the outputs, and synthesizes a final result.

The technical foundation is two previously published Sakana research papers: TRINITY and Conductor. The system is itself a language model that can recursively invoke itself and other models from the pool. The specific models in the pool and the routing logic are proprietary — Sakana does not disclose the pool composition or routing mechanism.

The developer sees one endpoint. For a task such as "write a Crossy Road clone in Three.js," Fugu itself decides which models handle structure generation, which handle refactoring, and which handle verification. Direct integration with Codex and other development environments requires no configuration.

Two variants and their use cases

Sakana offers two variants:

Fugu — fast, low-latency, designed for everyday interactive tasks. Integrates directly into environments like Codex without configuration.

Fugu Ultra — the flagship tier for complex tasks: security analysis, long-form research, multi-step patent investigations. On SWE-Bench Pro, Fugu Ultra scored 73.7, beating Claude Opus 4.8 (69.2) and GPT-5.5 (58.6). It remains below Claude Fable 5 (80.0) — withdrawn from public access on June 12, 2026 under a US export control directive.

Fugu Ultra pricing: $5/M input tokens, $30/M output (up to 272K context window). Above 272K: $10/$45. Claude Opus 4.8 costs $5/$25; Claude Fable 5 was priced at $10/$50 combined.

Benchmarks: strengths and limits

Fugu leads where coordination and cross-step verification matter most. On LiveCodeBench, Fugu Ultra scored 93.2, Fugu 92.9 — both above Fable 5 (89.8). On GPQA-Diamond, both Fugu Ultra and Fugu score 95.5, slightly above Mythos Preview (94.6).

However, Fugu does not win unconditionally. On SWE-Bench Pro, Claude Fable 5 (80.0) leads Fugu Ultra (73.7). On long-context recall (MRCRv2), GPT-5.5 takes 94.8 vs. Fugu Ultra's 93.6. On the CTI-REALM cybersecurity benchmark, Claude Opus 4.8 scores 69.6 vs. Fugu Ultra's 69.4. In those domains, a single, highly specialized model still holds the edge.

The geopolitical motivation

Sakana CEO David Ha explicitly cited regulatory risk as the primary argument for Fugu. On June 12, 2026, Anthropic withdrew Claude Fable 5 and Mythos 5 from public access in response to a US export control order from the Trump administration. Companies that had built pipelines on those models lost access overnight.

Fugu promises that if one model in the pool becomes unavailable, the system simply routes around it. Computational sovereignty in practice: no single vendor is a prerequisite for system operation.

A critical industry voice: Elie Bakouch from Prime Intellect noted that Fugu is a closed-source orchestrator running on closed-source models. The user controls neither which models are active nor how — "AI sovereignty" is therefore more a marketing term than a technical fact.

Why this matters

Fugu addresses a real corporate problem. Dependence on a single frontier model provider is an operational risk that materialized dramatically with the Fable and Mythos shutdown — the largest single-event model unavailability in the history of commercial AI.

Multi-model orchestration through a single API is a well-known pattern (LangGraph, AutoGen, CrewAI), but Fugu is the first product to package that complexity in a black-box system priced and performing competitively against monolithic models. For large enterprises with strict compliance requirements — opting out of specific vendors, opting out of training data usage — this offer has real value.

The open question: when will Fugu reach the European market. GDPR compliance within a routing architecture, where users cannot precisely know which model processes their data, is technically complex. Until resolved, the EU market remains inaccessible.

What's next

Fugu available from June 22, 2026 in most regions — excluding the EU/EEA — on subscription (from $20/month) and pay-as-you-go tiers. Subscribers by July 31, 2026 receive a free second month.

Sakana is working on GDPR compliance required to launch Fugu in the European Union.

Six-month results on dynamic benchmarks (LiveCodeBench, SWE-Bench Pro) will be the critical test — whether pool orchestration maintains its edge as increasingly powerful monolithic models are released.