GE-1

AgiBot's video-generative world model for robot control (launched August 2025). A closed-loop video generation + policy learning + simulation evaluation architecture realizes end-to-end seeing → thinking → acting reasoning. Partner to GO-1 in the G2 humanoid.

✓ Active🏢 EnterpriseWorld ModelVideo generation

Release date

1 August 2025

🏢AGIBOTProducer

Deployment:📱 On-device☁ Cloud

Overview

GE-1 is a video-generative world model developed by the Chinese company AgiBot, released in August 2025. It is designed as a partner to the GO-1 foundation model in AgiBot's humanoid control stack — in the industrial G2 (launched October 16, 2025) GE-1 is responsible for predicting future scenarios in time and space, allowing the robot to rehearse actions in a virtual environment before executing them in the real world.

Closed-loop architecture

GE-1 combines three components in a single closed loop: (1) video generation — predicting future observation frames conditioned on robot actions, (2) policy learning — using simulated future scenarios to tune the control policy, (3) simulation evaluation — validating planned actions in the virtual world before physical execution. Together it realizes full end-to-end reasoning from seeing, through thinking, to acting.

Pairing with GO-1

GE-1 does not replace GO-1; it complements it. GO-1 (ViLLA: VLM + Latent Planner + Action Expert) emits control signals for the current action, while GE-1 provides the prediction horizon as generated video and simulation. This two-model setup is the heart of the AI in the G2 humanoid — running locally on the NVIDIA Jetson Thor T5000 (2,070 TFLOPS FP4) with total control latency below 10 ms.

Position in the field

GE-1 fits in the broader wave of world models for robotics (world models, action-conditioned video generation), where generated predictions replace costly or unsafe physical trials. Similar approaches: NVIDIA Cosmos, Google Genie 3, World Action Model. GE-1 stands out by being integrated into a ready production stack (GO-1 + G2) and by its claimed industrial maturity — the model is not just a research prototype.

Classification

World ModelVideo generation

Applications

Robot policy training Robotic manipulation

Access & deployment

On-deviceCloud

Weights: Closed

Key parameters

📥 Input: image, video, robot sensors, robot state data

Robotics

Embodied task planningScene understandingSpatial predictionEnvironment modeling

Technical specification

License

Proprietary (closed)

Hardware requirements

Deployed locally on NVIDIA Jetson Thor T5000 (2,070 TFLOPS FP4) in the AGIBOT G2 humanoid, paired with GO-1. Training a generative video model requires data-center class GPU clusters.

Modalities

⬇ Input

imagevideorobot_sensorsrobot_state_data

⬆ Output

videorobot_actionsmotion_trajectories

Capabilities and applications

Native model capabilities

Video generation

The model's ability to generate video clips from a text prompt, image or another video, with control over length, resolution and visual characteristics.

Category: video

Video understanding

The model's ability to analyse and interpret video content — recognising actions, motion, events and relationships between objects over time.

Category: video

Planning

Forming and executing action plans for complex tasks.

Category: planning

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Robotics

Embodied task planningScene understandingSpatial predictionEnvironment modeling