Robots Atlas>ROBOTS ATLAS
GE-1
AgiBot's video-generative world model for robot control (launched August 2025). A closed-loop video generation + policy learning + simulation evaluation architecture realizes end-to-end seeing → thinking → acting reasoning. Partner to GO-1 in the G2 humanoid.
✓ Active🏢 EnterpriseWorld ModelVideo generation
Release date
1 August 2025
Deployment:📱 On-device☁ Cloud

Overview

GE-1 is a video-generative world model developed by the Chinese company AgiBot, released in August 2025. It is designed as a partner to the GO-1 foundation model in AgiBot's humanoid control stack — in the industrial G2 (launched October 16, 2025) GE-1 is responsible for predicting future scenarios in time and space, allowing the robot to rehearse actions in a virtual environment before executing them in the real world.

Closed-loop architecture

GE-1 combines three components in a single closed loop: (1) video generation — predicting future observation frames conditioned on robot actions, (2) policy learning — using simulated future scenarios to tune the control policy, (3) simulation evaluation — validating planned actions in the virtual world before physical execution. Together it realizes full end-to-end reasoning from seeing, through thinking, to acting.

Pairing with GO-1

GE-1 does not replace GO-1; it complements it. GO-1 (ViLLA: VLM + Latent Planner + Action Expert) emits control signals for the current action, while GE-1 provides the prediction horizon as generated video and simulation. This two-model setup is the heart of the AI in the G2 humanoid — running locally on the NVIDIA Jetson Thor T5000 (2,070 TFLOPS FP4) with total control latency below 10 ms.

Position in the field

GE-1 fits in the broader wave of world models for robotics (world models, action-conditioned video generation), where generated predictions replace costly or unsafe physical trials. Similar approaches: NVIDIA Cosmos, Google Genie 3, World Action Model. GE-1 stands out by being integrated into a ready production stack (GO-1 + G2) and by its claimed industrial maturity — the model is not just a research prototype.

Classification
World ModelVideo generation
Access & deployment
On-deviceCloud
Weights: Closed
Key parameters
📥 Input: image, video, robot sensors, robot state data
Robotics
Embodied task planningScene understandingSpatial predictionEnvironment modeling

Technical specification

License
Proprietary (closed)
Hardware requirements
Deployed locally on NVIDIA Jetson Thor T5000 (2,070 TFLOPS FP4) in the AGIBOT G2 humanoid, paired with GO-1. Training a generative video model requires data-center class GPU clusters.
Modalities
⬇ Input
imagevideorobot_sensorsrobot_state_data
⬆ Output
videorobot_actionsmotion_trajectories

Capabilities and applications

Native model capabilities
Video generation
The model's ability to generate video clips from a text prompt, image or another video, with control over length, resolution and visual characteristics.
Category: video
Video understanding
The model's ability to analyse and interpret video content — recognising actions, motion, events and relationships between objects over time.
Category: video
Planning
Forming and executing action plans for complex tasks.
Category: planning
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Robotics
Embodied task planningScene understandingSpatial predictionEnvironment modeling

Deployment and security

🤖 Related robots