MindOn: one AI model controls humanoids and dual-arms without teleoperation

On June 18, 2026, Chinese startup MindOne Robotics demonstrated a logistics workflow in which two Unitree G1 humanoids and two stationary dual-arm robots collaborate end-to-end — all controlled by a single AI model trained exclusively on human-centric data, with no robot-collected training data used.

Key takeaways

A single AI model (Mind-0) controls both humanoids and stationary arms with different kinematics simultaneously
System trained exclusively on human-centric data — no teleoperation or robot-collected data
Architecture: high-level reasoning layer + low-level whole-body control layer
Sim-to-real compensation model achieves sub-1 cm manipulation accuracy on Unitree G1
MindOne Robotics incorporated in Shenzhen in May 2025; viral G1 household demo in November 2025

A heterogeneous fleet instead of one universal robot

Most robotics companies are racing toward a single platform capable of everything. MindOne Robotics took a different premise: real industrial environments will always have a mix of hardware with different trade-offs.

Humanoids move freely in human-designed infrastructure — they climb onto conveyors, open doors, retrieve products from shelves. Stationary dual-arm robots are faster, repeatable and cheaper to operate for structured tasks: sorting, packing, conveyor work.

In the demo, G1 robots retrieved items from shelves and transported them to workstations. Stationary arms then took over: sorting, packing and sealing cartons. The full workflow — from shelf to finished parcel — required no human intervention.

Why human data, not teleoperation?

The standard method for collecting robot training data is teleoperation: a human pilots the robot while the hardware records movements. The problem is that the operator must adapt to the latency, range and kinematic constraints of the specific machine. The recording ends up stiff and suboptimal.

MindOne collects data differently: with egocentric cameras, handheld devices and full-body motion capture, recording how a human performs a task naturally. A Cross-Embodiment pipeline then converts this data into representations executable by different robots.

The Whole-Body Action Foundation Model — trained on tens of thousands of hours of motion capture data — handles low-level motion tracking while maintaining balance and physical feasibility, achieving sub-3 cm end-effector accuracy.

The second challenge is the sim-to-real gap: models working perfectly in simulation fail on physical hardware. MindOne uses a lightweight compensation model, trained on a small set of real-world deployment data. On the Unitree G1 platform — known for limited arm precision — this model brings manipulation error below 1 cm.

How Mind-0 works under the hood

The architecture has four components that together bridge human behavior and robot execution.

The Cross-Embodiment Data Pipeline translates human demonstrations into the action space of different robots. The Whole-Body Foundation Model provides low-level motion tracking. The Execution Compensation Model corrects real-time dynamics errors. Hierarchical Coordination Reasoning addresses the latency problem: human data is delay-free, physical robots are not — the high-level framework monitors low-level feedback and adaptively decides when and how to invoke specific skills.

This last element is technically significant. Without cross-level synchronization, desynchronization occurs: the model issues commands before the arm reaches position, and the entire process breaks down. Instead of building separate models per robot, MindOne built a single intelligence layer capable of operating on any embodiment.

Industry context

MindOne Robotics is not the only company exploring cross-embodiment learning. Google DeepMind (RT-X), Berkeley (Open X-Embodiment) and Physical Intelligence (Pi-zero) have all proposed similar approaches. All, however, used robot-collected datasets — not exclusively human data.

A human-only training pipeline in a production environment is relatively new. If MindOne's approach scales, it could significantly reduce the cost of preparing new robotics deployments — no need to spend months collecting data from new hardware before each project.

Why it matters

For years, industrial robotics followed a fixed formula: one production line, one robot type, hundreds of hours of programming and teleoperation data. Cross-embodiment learning with human data inverts that logic.

If one model can operate across different platforms without restarting the data-collection cycle, the time to deploy new robots shrinks dramatically. For e-commerce logistics — where warehouse layouts change quarterly — or manufacturing with continuous product rotation, this is an operational step change.

The independence from teleoperation as the sole data channel is separately valuable. Teleoperation scales linearly with the number of workers and platforms. Human data from motion capture can be collected in any environment, by any person, without specialized robotics hardware. That is a different order of scale.

What's next

MindOne plans to extend deployment to mobile dual-arm robots and additional platforms combining mobility and manipulation.
The company is scaling human-centric datasets and improving the model for long-horizon tasks — the current demo covers sequences of approximately 10 steps.
An open question remains independent verification: all figures (sub-1 cm accuracy on G1) come from MindOne's own measurements — no external benchmark has confirmed these results.