Beijing-based ShengShu Technology announced on April 29, 2026 the launch of Motubrain — a World Action Model (WAM) built on a Mixture-of-Transformers architecture. The system is designed to function as a hardware-agnostic "brain" for diverse robot platforms: industrial, commercial, and domestic. Its distinguishing feature is the joint learning of perception, world prediction, and action control within a single model — without separate subsystems. Motubrain is backed by a $293 million (2 billion yuan) Series B led by Alibaba Cloud, with participation from Baidu Ventures and Luminous Ventures.
Beyond the VLA paradigm: Mixture-of-Transformers architecture
The industry has been dominated in recent years by Vision-Language-Action (VLA) models that graft action outputs onto language foundations. ShengShu departs from this pattern in favor of a Mixture-of-Transformers (MoT) architecture, in which three data streams — video, world model, and action control — are processed jointly. The model leverages the generative foundations of ShengShu's Vidu video platform, allowing it to "imagine" future environmental states and compute the inverse dynamics needed to reach them.
Benchmark results
ShengShu presented results on two independent benchmarks: WorldArena — a global ranking measuring physical reasoning and prediction — where Motubrain achieved a 63.77 EWM Score, placing it in the top three. RoboTwin 2.0 — 50 randomized manipulation tasks — where Motubrain achieved a 96.0% average success rate, the only model on the leaderboard to exceed 95% in randomized environments with varied lighting and object positions.
In task-scaling evaluations, Motubrain's success rate grew with task variety, reaching 92% at 50 tasks and outperforming Pi-0.5 by approximately 37%. ShengShu also claims a 13.55x improvement in data efficiency over traditional methods.
Hardware independence and real-world deployments
Motubrain is designed as a platform-agnostic intelligence layer. The model does not require full retraining when switching hardware — it transfers skills across different robot types. It is already in use in training programs at Astribot, SimpleAI, and Anyverse Dynamics.
In real-world tests, Motubrain-trained robots demonstrated emergent "retry" behaviors: when a robot attempting to scoop a ladle comes up empty, it automatically re-attempts — despite never being explicitly trained on failure recovery data.
Why it matters
Motubrain represents a different philosophy of robotics scaling than the dominant VLA approach. Instead of adding an action head to a language model, ShengShu builds a model that treats motion and perception as a unified generative problem. Benchmark results suggest this approach handles task heterogeneity better — a critical requirement for industrial robots managing hundreds of scenarios simultaneously. The open question: will the claimed 13.55x data efficiency improvement translate to a comparable gain under full production deployment conditions?
What's next
ShengShu announced plans to expand OEM partnerships and deployments into new robotics segments. The key test will be scaling from lab environments to full industrial deployments with 24/7 reliability requirements. Series B funds will be directed toward further model development and data infrastructure buildout.





