Motubrain is a World Action Model developed by ShengShu Technology and announced on April 29, 2026. It replaces a stack of specialized robotic modules with a single unified model that acts as a 'brain' for robots operating in physical environments — industrial, commercial, and domestic.
Architecturally, Motubrain is a Unified Multimodal Model that treats video and actions as two continuous modalities trained jointly. Its core is a three-stream Mixture-of-Transformers (MoT) integrating video, language, and action experts. A single training cycle endows the system with five abilities at once: Vision-Language-Action control (VLA), world modeling, video generation, Inverse Dynamics Modeling (IDM), and joint video-action prediction.
Motubrain learns from a wide range of sources: unlabeled videos, task recordings without language annotations, and data from multiple robot embodiments. Its proprietary latent-action framework extracts physical motion directly from video at scale — including human footage, simulation data, and multi-robot trajectories — without manual labeling. The system supports task sequences of up to 10 atomic actions and is designed for cross-embodiment skill transfer between different robot bodies.
At launch, Motubrain ranks #1 on two leading robotics benchmarks: RoboTwin 2.0 (96.0 average across 50 tasks) and WorldArena (EWM score 63.77). ShengShu Technology has formed strategic partnerships with Astribot, Shenpu Intelligence, and Wujie Power to develop the surrounding ecosystem. According to ShengShu, Motubrain is already in active use by several robotics companies for training programs.