Robots Atlas>ROBOTS ATLAS
Gemini Robotics-ER 1.6

Gemini Robotics-ER 1.6

1.6ย ยทย Family: Gemini
Vision-Language Model by Google DeepMind with advanced spatial and embodied reasoning, designed for robotics applications.
โณ Previewโณ Limited accessMultimodalRobotics foundation model๐Ÿ“ Gemini
Context window
128K
tokens
Max output
64,000
tokens
Release date
14 April 2026
Access:APIHostedDeployment:โ˜ Cloud

Overview

Gemini Robotics-ER 1.6 (Embodied Reasoning) is a Vision-Language Model (VLM) developed by Google DeepMind, built on the Gemini 3.0 Flash architecture. It specializes in spatial and physical reasoning for robotics โ€” including precision pointing, task planning, success detection, and industrial instrument reading.

The model processes text, image, audio, and video inputs (up to 128K token context) and generates text outputs. It can natively call external tools (Google Search, VLA models, user-defined functions) and combine visual reasoning with code execution (agentic vision). It serves as a high-level reasoning module in robotic systems and does not directly generate motor control commands.

Classification
MultimodalRobotics foundation model
Family: Gemini
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
๐Ÿ“ Context: 128K
โœ“ Tools
๐Ÿ“ฅ Input: text, image, audio, video
Robotics
Spatial reasoningScene understandingEmbodied task planningVisual groundingObject affordance understandingSpatial prediction

Technical specification

Context window
128K
tokens
Max output tokens
64,000
tokens per response
Features:โœ“ Tool use
Modalities
โฌ‡ Input
textimageaudiovideo
โฌ† Output
text

Capabilities and applications

Native model capabilities
Reasoning
Category: reasoning
Multi-step reasoning
Category: reasoning
Planning
Category: planning
Image understanding
Category: vision
Multimodal understanding
Category: multimodal
Function Calling
Category: planning
Structured output
Category: structured_generation
Video Understanding
Category: video
Audio understanding
Category: audio
Robotics
Spatial reasoningScene understandingEmbodied task planningVisual groundingObject affordance understandingSpatial prediction

Benchmark results

2 benchmarks
Instrument Reading (internal, agentic vision disabled)
success rate ยท agentic vision disabled
86%
๐Ÿ“„ https://deepmind.google/blog/gemini-robotics-er-1-6/
Score for Gemini Robotics-ER 1.6 without agentic vision. For comparison: ER 1.5 = 23%, Gemini 3.0 Flash = 67%.
Instrument Reading (internal, agentic vision enabled)
success rate ยท agentic vision enabled (zoom + code execution)
93%
๐Ÿ“„ https://deepmind.google/blog/gemini-robotics-er-1-6/
Score with agentic vision mode combining visual reasoning with code execution.

Technical architecture

Core Architecture
Training Techniques