Mind-0

MindOn's universal robot foundation model — a single VLA driving heterogeneous platforms (humanoids, dual-arm rigs), trained exclusively on human-centric data.

🔬 Research🔬 Research onlyRobotics foundation modelVision-Language-Action model

Release date

18 June 2026

🏢MindOne RoboticsProducer

Deployment:📱 On-device

Overview

Mind-0 is an embodied robotics foundation AI model built by the Shenzhen-based Chinese startup MindOne Robotics (MindOn). It is a Vision-Language-Action (VLA) model designed as a single "mind" driving heterogeneous hardware platforms — from Unitree G1 humanoids to stationary dual-arm rigs. The core thesis of Mind-0 is that instead of training a separate model per platform on expensive teleoperation data, one can train one model on human-centric data (whole-body motion capture, egocentric cameras, handheld devices) and have it generalize across embodiments.

Two-layer architecture

Mind-0 decouples intelligence from embodiment. The high-level layer handles scene understanding, task reasoning, and behavior generation. The low-level Whole-Body Action Foundation Model — trained on tens of thousands of hours of motion-capture data — translates intentions into physical motion respecting each robot's dynamics, achieving sub-3 cm end-effector tracking accuracy while maintaining global motion coherence and balance.

Cross-Embodiment Data Pipeline

The cross-embodiment pipeline converts large-scale human demonstrations into action representations executable by different robots, effectively transferring human dexterity to hardware with fundamentally different kinematics, dynamics, and workspaces.

Real-World Execution Compensation Model

A lightweight compensation model trained on a small amount of real deployment data closes the sim-to-real gap. It corrects tracking errors, dynamics mismatch, and embodiment-specific deviations, reportedly achieving sub-1 cm manipulation accuracy on the Unitree G1 — a platform typically known for limited arm precision.

Hierarchical Coordination Reasoning

Human data is inherently delay-free, while robots suffer from perception and control latency. Mind-0 addresses this with a hierarchical reasoning loop — the high-level policy continuously monitors low-level feedback and adaptively decides when and how to invoke specific skills, rather than directly imitating human demonstrations.

Public demonstrations

Mind-0's first viral demo (November 2025) showed a Unitree G1 autonomously performing complex household chores with no speed-ups and no teleoperation. The second (June 18, 2026) showcased a heterogeneous fleet — two Unitree G1 humanoids and two stationary dual-arm rigs — running an end-to-end logistics workflow (shelf picking, transport, sorting, packing, tape sealing), with all four robots driven by a single Mind-0 model.

Classification

Robotics foundation modelVision-Language-Action model

Applications

Robotic manipulation Robot policy training

Access & deployment

On-device

Weights: Closed

Key parameters

📥 Input: robot sensors, robot state data, image, video

Robotics

Robot manipulationBimanual manipulationDexterous manipulationRobot controlRobot navigationMotion planningScene understandingEmbodied task planning

Technical specification

License

Proprietary (closed)

Hardware requirements

Deployed on commercial Unitree G1 humanoids and stationary dual-arm rigs (embodiment-agnostic architecture).

Modalities

⬇ Input

robot_sensorsrobot_state_dataimagevideo

⬆ Output

robot_actionsrobot_commandsmotion_trajectoriesmanipulator_control

Capabilities and applications

Native model capabilities

Cross-embodiment transfer

The ability of a single model to control robots with different morphologies (humanoids, dual-arm rigs, mobile platforms) without training a separate model per platform. Intelligence is decoupled from embodiment, so the same policy runs on hardware with different kinematics and dynamics.

Category: robotics

Vision-language-action grounding

The ability of a VLA model to ground visual perception and a language instruction into a concrete physical robot action. The model understands the scene and intent, then generates an executable action sequence, closing the loop from observation to motion.

Category: robotics

Planning

Forming and executing action plans for complex tasks.

Category: planning

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multimodal understanding

Category: multimodal

Robotics

Robot manipulationBimanual manipulationDexterous manipulationRobot controlRobot navigationMotion planningScene understandingEmbodied task planning

Application domains