Robots Atlas>ROBOTS ATLAS
Gemini Robotics 1.5

Gemini Robotics 1.5

1.5ย ยทย Family: Gemini
Vision-Language-Action (VLA) model by Google DeepMind that converts visual inputs and language instructions into motor commands for robots.
โณ Previewโณ Limited accessMultimodalRobotics foundation modelVision-Language-Action model๐Ÿ“ Gemini
Context window
32K
tokens
Release date
14 April 2026
Access:HostedDeployment:โ˜ Cloud

Overview

Gemini Robotics 1.5 is Google DeepMind's latest Vision-Language-Action (VLA) model, building on the original Gemini Robotics. It processes visual input (robot camera images) and text instructions and outputs motor commands to control robot joints. This is a key distinction from VLM/LLM models: the model does not describe what it sees, but directly controls physical motion.

The model generalizes across new instructions, actions, and visual contexts, and a single model can operate across diverse robotic platforms (ALOHA, Bi-arm Franka, humanoid Apptronik Apollo). Paired with Gemini Robotics-ER 1.6, it forms a complete system for physical robot control.

Classification
MultimodalRobotics foundation modelVision-Language-Action model
Family: Gemini
Access & deployment
Hosted
Cloud
Weights: Closed
Key parameters
๐Ÿ“ Context: 32K
๐Ÿ“ฅ Input: text, image
Robotics
Dexterous manipulationRobot manipulationRobot controlEmbodied task planningVisual groundingBimanual manipulationMotion planning

Technical specification

Context window
32K
tokens
Modalities
โฌ‡ Input
textimage
โฌ† Output
textaction

Capabilities and applications

Native model capabilities
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Multi-step reasoning
Carrying out multi-step chains of reasoning across long, complex tasks.
Category: reasoning
Planning
Forming and executing action plans for complex tasks.
Category: planning
Image understanding
Analysing and interpreting the content of images.
Category: vision
Multimodal understanding
Category: multimodal
Multilingual
Understanding and generating text in many languages.
Category: language
Robotics
Dexterous manipulationRobot manipulationRobot controlEmbodied task planningVisual groundingBimanual manipulationMotion planning

Benchmark results

5 benchmarks
Generalization: In-Distribution (internal)
progress score ยท progress score, robotic manipulation tasks
0.830-1
๐Ÿ“„ https://deepmind.google/models/gemini-robotics/gemini-robotics/
Gemini Robotics 1.5 vs. prior versions. Score 0.83 outperforms Gemini Robotics and On-Device.
Generalization: Instruction Generalization (internal)
progress score
0.760-1
๐Ÿ“„ https://deepmind.google/models/gemini-robotics/gemini-robotics/
Generalization: Action Generalization (internal)
progress score
0.540-1
๐Ÿ“„ https://deepmind.google/models/gemini-robotics/gemini-robotics/
Generalization: Visual Generalization (internal)
progress score
0.810-1
๐Ÿ“„ https://deepmind.google/models/gemini-robotics/gemini-robotics/
Generalization: Task Generalization (internal)
progress score
0.700-1
๐Ÿ“„ https://deepmind.google/models/gemini-robotics/gemini-robotics/