Robots Atlas>ROBOTS ATLAS
Gemini 2.5 Pro

Gemini 2.5 Pro

gemini-2.5-pro · Family: Gemini
Gemini 2.5 Pro is Google DeepMind's flagship reasoning model, generally available June 17, 2025. Built on Sparse MoE architecture, supports up to 1M token context, text/audio/image/video input, and integrated thinking mode.
✓ Active✓ Public accessLLMMultimodalReasoning modelTool-using model📁 Gemini
Context window
do 1M tokenów
tokens
Parameters
nieujawnione
parameters
Max output
65,536
tokens
Release date
25 March 2025
Access:APIHostedDeployment:☁ Cloud

Overview

Gemini 2.5 Pro is Google DeepMind's flagship language model. Released in preview on March 25, 2025, general availability (GA) followed on June 17, 2025. API identifier: gemini-2.5-pro.

Architecture and Capabilities

The model is built on a Sparse Mixture of Experts (MoE) architecture. Context window: 1,048,576 tokens; maximum output tokens: 65,536. Knowledge cutoff: January 2025. Supports multimodal input (text, image, audio, video, documents), tool use, and a built-in thinking mode (extended reasoning). Fine-tuning is not available.

Benchmark Results

SWE-bench Verified 63.8% (custom agent setup), GPQA Diamond 84.0% (pass@1), AIME 2025 86.7% (pass@1), AIME 2024 92.0%, Humanity's Last Exam 18.8% without tools (highest score at launch). Aider Polyglot 74.0%, MMMU 81.7%, Global MMLU Lite 89.8%. MRCR v1 91.5% at 128K context and 83.1% at the full 1M tokens. Following the June preview update, the model leads both LMArena (Elo 1470) and WebDev Arena (Elo 1443) leaderboards.

Pricing and Availability

Closed weights model, available via the Gemini API (Google AI Studio) and Vertex AI. Two-tier pricing based on context length: ≤200K tokens — $1.25/MTok input, $10.00/MTok output; >200K tokens — $2.50/$15.00/MTok. Thinking tokens are billed as output. Batch API available at ~50% discount. A free tier is available in Google AI Studio.

Safety

The model has been evaluated in accordance with Google DeepMind's Responsible Scaling Policy (cybersecurity, CBRN, ML R&D, deceptive alignment). At Google I/O 2025 it was described as the "most secure model family to date," with notable improvements to indirect prompt injection defenses. The Deep Think mode underwent additional safety evaluations before broad release. The paid API tier does not use customer data for model training.

Classification
LLMMultimodalReasoning modelTool-using model
Family: Gemini
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
📏 Context: do 1M tokenów
🧩 Parameters: nieujawnione
Tools
📥 Input: text, image, audio, video

Technical specification

Context window
do 1M tokenów
tokens
Parameters
nieujawnione
parameters
Max output tokens
65,536
tokens per response
Knowledge cutoff
1 Jan 2025
Knowledge boundary
License
proprietary
Hardware requirements
Access via Google Cloud infrastructure (Vertex AI / Gemini API)
Features:Tool use
Modalities
⬇ Input
textimageaudiovideodocumentsstructured_dataurls
⬆ Output
textcodestructured_datasummariesanalytical_reportsimage

Capabilities and applications

Native model capabilities
Reasoning
Category: reasoning
Multi-step reasoning
Category: reasoning
Long context
Category: reasoning
Coding
Category: coding
Function Calling
Category: planning
Structured output
Category: structured_generation
Audio understanding
Category: audio
Image understanding
Category: vision
Video Understanding
Category: video
Chart understanding
Category: vision
Diagram reasoning
Category: reasoning
OCR
Category: vision
Multilingual
Category: language
Planning
Category: planning
Streaming output
Category: reasoning
Interleaved Multimodal Input
Category: reasoning
Multimodal understanding
Category: multimodal

Benchmark results

15 benchmarks
MMLU
accuracy · general knowledge benchmark
90%+%
📅 25 Mar 2025📄 Google DeepMind
Approximate result based on Google materials.
SWE-bench Verified
accuracy · Custom agent setup with multiple trajectories and model-based re-scoring. Model ID: gemini-2.5-pro-exp-03-25.
63.8%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025)
Result from Google's custom agent setup. At launch, scored higher than OpenAI o3-mini (61.0%) and lower than Claude 3.7 Sonnet (70.3%). Technical report (06-05 snapshot) records 67.2%.
GPQA Diamond
pass@1 · Single attempt (pass@1), no majority voting. Graduate-level STEM questions.
84.0%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025) / technical report gemini_v2_5_report.pdf
Highest score among compared models at launch. Grok 3 Beta: 80.2%, o3-mini: lower.
AIME 2025
pass@1 · Single-attempt evaluation (pass@1), no majority voting. American Invitational Mathematics Examination 2025.
86.7%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025) / technical report
Leaderboard score at launch. o3-mini: 86.5% (marginally lower). Results from matharena.ai.
AIME 2024
pass@1 · Single-attempt evaluation (pass@1), no majority voting.
92.0%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025) / technical report
Highest score among compared models at launch.
Humanity's Last Exam (bez narzędzi)
accuracy · No tool use. A multidisciplinary benchmark created by domain experts.
18.8%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025)
Highest score at launch without tools. o3-mini: 14.0%, Claude 3.7 Sonnet: 8.9%, DeepSeek R1: 8.6%.
LiveCodeBench v5
pass@1 · Results from livecodebench.github.io (10/1/2024–2/1/2025 in UI).
70.4%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf / DataCamp
Slightly below o3-mini (74.1%) and Grok 3 Beta (70.6%). Improved from 30.5% (Gemini 1.5 Pro) to 74.2% per the technical report.
Aider Polyglot (Whole File Editing)
pass_rate · Average of 3 runs. Multilingual code editing. Results from aider.chat/docs/leaderboards/.
74.0%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 / technical report
Technical report score: 82.2% (newer snapshot 06-05). Launch score (03-25): 74.0%.
MMMU
pass@1 · Multimodal academic reasoning (texts, images, diagrams, maps).
81.7%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf / Medium (Mehul Gupta)
Highest pass@1 among compared models at launch.
MRCR v1 (128K context)
accuracy · Multi-Round Coreference Resolution – locating multiple needles within a 128K context.
91.5%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf
Results added March 26, 2025 as a blog update. In the 1M token version: 83.1%.
MRCR v1 (1M context)
accuracy · Multi-round coreference resolution using the full 1M token context window.
83.1%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf
The only model in the benchmark supporting the full 1M token context at launch.
Global MMLU Lite (multilingual)
accuracy · Multilingual and multidisciplinary text comprehension.
89.8%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf
Highest score among compared models at launch.
SimpleQA
accuracy · Short-form factual questions.
52.9%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf
GPT-4.5 scored 62.5% on this benchmark.
LMArena (Chatbot Arena)
Elo · Human preference ranking of AI responses. Score from the preview update (June 2025) prior to GA release.
1470points
📅 1 Jun 2025📄 Google DeepMind – blog.google (czerwiec 2025 preview update)
Leaderboard leader following the preview update. An increase of 24 Elo points relative to the May version.
WebDev Arena
Elo · Web development ranking. Increase of 35 Elo points.
1443points
📅 1 Jun 2025📄 Google DeepMind – blog.google (czerwiec 2025 preview update)
Leads the WebDev Arena leaderboard following the preview update (June 2025).

Pricing

Deployment and security

🔒 Security / Enterprise
✓ Verified enterprise information

Model evaluated for cybersecurity, CBRN, autonomy, and other risks in accordance with Google DeepMind's Responsible Scaling Policy. Detailed safety assessments are included in the technical report and model card. Advanced mitigations against indirect prompt injection have been implemented.

The technical report includes full safety evaluations covering cybersecurity, CBRN, Machine Learning R&D, and Deceptive Alignment. A model card is available at modelcards.withgoogle.com. At Google I/O 2025, Google announced significant improvements to protection against indirect prompt injection attacks, describing Gemini 2.5 as the "most secure model family to date". Deep Think mode underwent additional safety evaluations before broad release. Training data was subject to safety filtering. The paid API tier does not use data for model training, unlike the free tier.
Updated: 17 Jun 2025↗ Security documentation