Beyond the VLA paradigm: Mixture-of-Transformers architecture Benchmark results Hardware independence and real-world deployments Why it matters What's next Sources

April 29, 2026 · 3 min readAI robotics Robot Learning

ShengShu unveils Motubrain: one AI brain for every robot platform

Pan RobocikApril 29, 2026 · 3 min read

AI-assisted · editorial review

Beijing-based ShengShu Technology announced on April 29, 2026 the launch of Motubrain — a World Action Model (WAM) built on a Mixture-of-Transformers architecture. The system is designed to function as a hardware-agnostic "brain" for diverse robot platforms: industrial, commercial, and domestic. Its distinguishing feature is the joint learning of perception, world prediction, and action control within a single model — without separate subsystems. Motubrain is backed by a $293 million (2 billion yuan) Series B led by Alibaba Cloud, with participation from Baidu Ventures and Luminous Ventures.

Beyond the VLA paradigm: Mixture-of-Transformers architecture

The industry has been dominated in recent years by Vision-Language-Action (VLA) models that graft action outputs onto language foundations. ShengShu departs from this pattern in favor of a Mixture-of-Transformers (MoT) architecture, in which three data streams — video, world model, and action control — are processed jointly. The model leverages the generative foundations of ShengShu's Vidu video platform, allowing it to "imagine" future environmental states and compute the inverse dynamics needed to reach them.

Benchmark results

ShengShu presented results on two independent benchmarks: WorldArena — a global ranking measuring physical reasoning and prediction — where Motubrain achieved a 63.77 EWM Score, placing it in the top three. RoboTwin 2.0 — 50 randomized manipulation tasks — where Motubrain achieved a 96.0% average success rate, the only model on the leaderboard to exceed 95% in randomized environments with varied lighting and object positions.

In task-scaling evaluations, Motubrain's success rate grew with task variety, reaching 92% at 50 tasks and outperforming Pi-0.5 by approximately 37%. ShengShu also claims a 13.55x improvement in data efficiency over traditional methods.

Hardware independence and real-world deployments

Motubrain is designed as a platform-agnostic intelligence layer. The model does not require full retraining when switching hardware — it transfers skills across different robot types. It is already in use in training programs at Astribot, SimpleAI, and Anyverse Dynamics.

In real-world tests, Motubrain-trained robots demonstrated emergent "retry" behaviors: when a robot attempting to scoop a ladle comes up empty, it automatically re-attempts — despite never being explicitly trained on failure recovery data.

Why it matters

Motubrain represents a different philosophy of robotics scaling than the dominant VLA approach. Instead of adding an action head to a language model, ShengShu builds a model that treats motion and perception as a unified generative problem. Benchmark results suggest this approach handles task heterogeneity better — a critical requirement for industrial robots managing hundreds of scenarios simultaneously. The open question: will the claimed 13.55x data efficiency improvement translate to a comparable gain under full production deployment conditions?

What's next

ShengShu announced plans to expand OEM partnerships and deployments into new robotics segments. The key test will be scaling from lab environments to full industrial deployments with 24/7 reliability requirements. Series B funds will be directed toward further model development and data infrastructure buildout.

Sources

Technical index

Share this article

Aktualności29 kwietnia 2026

World Action Models: What They Are and How They Work

Vision-Language-Action (VLA) models have become the dominant approach to building AI-driven robot control systems over the past several years. Their emerging successor — the World Action Model (WAM) — represents a distinct architectural category that replaces direct image-to-action mapping with video generation as an intermediate planning mechanism. DreamZero, developed by a team at NVIDIA and published in February 2026 as a research paper on arXiv, is the first publicly described system of this class to operate in real time on a real robot. Understanding how it works matters because it points toward a plausible direction for the next generation of robotic foundation models.

Aktualności7 maja 2026

$200 million bet on AI data centers floating in the ocean

Panthalassa raised $210M for ocean-based AI computing nodes powered by wave energy. The 85-meter Ocean-3 spheres are set for Pacific testing in 2026, sea-cooled and transmitting AI outputs via satellite.

Aktualności7 maja 2026

Tutor Intelligence launches Data Factory: 100 robots learning from humans in real time

MIT startup Tutor Intelligence launched DF1 — the US's largest robotic data factory with 100 Sonny robots. The company detects behavior errors 100x faster and is already commercializing Cassie mobile manipulator with usage-based pricing.

Aktualności6 maja 2026

The humanoid dilemma: Chinese robots in the US — espionage risk or essential hardware?

Unitree G1 at $13,500 is flooding US labs, but experts warn of data exfiltration risks. The dilemma: China makes the cheapest hardware, the US wants to build the AI brains. The American Security Robotics Act attempts to resolve this.