Gemini 2.5 Pro

gemini-2.5-pro · Family: Gemini

Gemini 2.5 Pro is Google DeepMind's flagship reasoning model, generally available June 17, 2025. Built on Sparse MoE architecture, supports up to 1M token context, text/audio/image/video input, and integrated thinking mode.

✓ Active✓ Public accessLLMMultimodalReasoning modelTool-using model📁 Gemini

Context window

do 1M tokenów

tokens

Parameters

nieujawnione

parameters

Max output

65,536

tokens

Release date

25 March 2025

🔬Google DeepMindResearch lab 🏢GoogleOwner

Access:APIHostedDeployment:☁ Cloud

Overview

Gemini 2.5 Pro is Google DeepMind's flagship language model. Released in preview on March 25, 2025, general availability (GA) followed on June 17, 2025. API identifier: gemini-2.5-pro.

Architecture and Capabilities

The model is built on a Sparse Mixture of Experts (MoE) architecture. Context window: 1,048,576 tokens; maximum output tokens: 65,536. Knowledge cutoff: January 2025. Supports multimodal input (text, image, audio, video, documents), tool use, and a built-in thinking mode (extended reasoning). Fine-tuning is not available.

Benchmark Results

SWE-bench Verified 63.8% (custom agent setup), GPQA Diamond 84.0% (pass@1), AIME 2025 86.7% (pass@1), AIME 2024 92.0%, Humanity's Last Exam 18.8% without tools (highest score at launch). Aider Polyglot 74.0%, MMMU 81.7%, Global MMLU Lite 89.8%. MRCR v1 91.5% at 128K context and 83.1% at the full 1M tokens. Following the June preview update, the model leads both LMArena (Elo 1470) and WebDev Arena (Elo 1443) leaderboards.

Pricing and Availability

Closed weights model, available via the Gemini API (Google AI Studio) and Vertex AI. Two-tier pricing based on context length: ≤200K tokens — $1.25/MTok input, $10.00/MTok output; >200K tokens — $2.50/$15.00/MTok. Thinking tokens are billed as output. Batch API available at ~50% discount. A free tier is available in Google AI Studio.

Safety

The model has been evaluated in accordance with Google DeepMind's Responsible Scaling Policy (cybersecurity, CBRN, ML R&D, deceptive alignment). At Google I/O 2025 it was described as the "most secure model family to date," with notable improvements to indirect prompt injection defenses. The Deep Think mode underwent additional safety evaluations before broad release. The paid API tier does not use customer data for model training.

Classification

LLMMultimodalReasoning modelTool-using model

Family: Gemini

Applications

Chatbots Document analysis Data analysis

Access & deployment

APIHosted

Cloud

Weights: Closed

Key parameters

📏 Context: do 1M tokenów

🧩 Parameters: nieujawnione

✓ Tools

📥 Input: text, image, audio, video…

Technical specification

Context window

do 1M tokenów

tokens

Parameters

nieujawnione

parameters

Max output tokens

65,536

tokens per response

Knowledge cutoff

1 Jan 2025

Knowledge boundary

License

proprietary

Hardware requirements

Access via Google Cloud infrastructure (Vertex AI / Gemini API)

Features:✓ Tool use

Modalities

⬇ Input

textimageaudiovideodocumentsstructured_dataurls

⬆ Output

textcodestructured_datasummariesanalytical_reportsimage

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Long context

Maintaining coherence and focus across very long input context.

Category: language

Coding

Generating, analysing and modifying source code.

Category: coding

Function Calling

Category: planning

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Audio understanding

Category: audio

Image understanding

Analysing and interpreting the content of images.

Category: vision

Video Understanding

Category: video

Chart understanding

Reading and interpreting charts, tables and diagrams.

Category: vision

Diagram reasoning

Category: reasoning

OCR

Recognising text within images and documents.

Category: vision

Multilingual

Understanding and generating text in many languages.

Category: language

Planning

Forming and executing action plans for complex tasks.

Category: planning

Streaming output

Category: reasoning

Interleaved Multimodal Input

Category: reasoning

Multimodal understanding

Category: multimodal

Application domains

Chatbots Document analysis Data analysis

Benchmark results

15 benchmarks

MMLU

accuracy · general knowledge benchmark

90%+%

📅 25 Mar 2025📄 Google DeepMind

Approximate result based on Google materials.

SWE-bench Verified

accuracy · Custom agent setup with multiple trajectories and model-based re-scoring. Model ID: gemini-2.5-pro-exp-03-25.

63.8%

📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025)

Result from Google's custom agent setup. At launch, scored higher than OpenAI o3-mini (61.0%) and lower than Claude 3.7 Sonnet (70.3%). Technical report (06-05 snapshot) records 67.2%.

GPQA Diamond

pass@1 · Single attempt (pass@1), no majority voting. Graduate-level STEM questions.

84.0%

📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025) / technical report gemini_v2_5_report.pdf

Highest score among compared models at launch. Grok 3 Beta: 80.2%, o3-mini: lower.

AIME 2025

pass@1 · Single-attempt evaluation (pass@1), no majority voting. American Invitational Mathematics Examination 2025.

86.7%

📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025) / technical report

Leaderboard score at launch. o3-mini: 86.5% (marginally lower). Results from matharena.ai.

AIME 2024

pass@1 · Single-attempt evaluation (pass@1), no majority voting.

92.0%

📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025) / technical report

Highest score among compared models at launch.

Humanity's Last Exam (bez narzędzi)

accuracy · No tool use. A multidisciplinary benchmark created by domain experts.

18.8%

📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025)

Highest score at launch without tools. o3-mini: 14.0%, Claude 3.7 Sonnet: 8.9%, DeepSeek R1: 8.6%.

LiveCodeBench v5

pass@1 · Results from livecodebench.github.io (10/1/2024–2/1/2025 in UI).

70.4%

📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf / DataCamp

Slightly below o3-mini (74.1%) and Grok 3 Beta (70.6%). Improved from 30.5% (Gemini 1.5 Pro) to 74.2% per the technical report.

Aider Polyglot (Whole File Editing)

pass_rate · Average of 3 runs. Multilingual code editing. Results from aider.chat/docs/leaderboards/.

74.0%

📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 / technical report

Technical report score: 82.2% (newer snapshot 06-05). Launch score (03-25): 74.0%.

MMMU

pass@1 · Multimodal academic reasoning (texts, images, diagrams, maps).

81.7%

📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf / Medium (Mehul Gupta)

Highest pass@1 among compared models at launch.

MRCR v1 (128K context)

accuracy · Multi-Round Coreference Resolution – locating multiple needles within a 128K context.

91.5%

📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf

Results added March 26, 2025 as a blog update. In the 1M token version: 83.1%.

MRCR v1 (1M context)

accuracy · Multi-round coreference resolution using the full 1M token context window.

83.1%

📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf

The only model in the benchmark supporting the full 1M token context at launch.

Global MMLU Lite (multilingual)

accuracy · Multilingual and multidisciplinary text comprehension.

89.8%

📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf

Highest score among compared models at launch.

SimpleQA

accuracy · Short-form factual questions.

52.9%

📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf

GPT-4.5 scored 62.5% on this benchmark.

LMArena (Chatbot Arena)

Elo · Human preference ranking of AI responses. Score from the preview update (June 2025) prior to GA release.

1470points

📅 1 Jun 2025📄 Google DeepMind – blog.google (czerwiec 2025 preview update)

Leaderboard leader following the preview update. An increase of 24 Elo points relative to the May version.

WebDev Arena

Elo · Web development ranking. Increase of 35 Elo points.

1443points

📅 1 Jun 2025📄 Google DeepMind – blog.google (czerwiec 2025 preview update)

Leads the WebDev Arena leaderboard following the preview update (June 2025).

Pricing

Deployment and security

🔒 Security / Enterprise

✓ Verified enterprise information

Model evaluated for cybersecurity, CBRN, autonomy, and other risks in accordance with Google DeepMind's Responsible Scaling Policy. Detailed safety assessments are included in the technical report and model card. Advanced mitigations against indirect prompt injection have been implemented.

The technical report includes full safety evaluations covering cybersecurity, CBRN, Machine Learning R&D, and Deceptive Alignment. A model card is available at modelcards.withgoogle.com. At Google I/O 2025, Google announced significant improvements to protection against indirect prompt injection attacks, describing Gemini 2.5 as the "most secure model family to date". Deep Think mode underwent additional safety evaluations before broad release. Training data was subject to safety filtering. The paid API tier does not use data for model training, unlike the free tier.

Updated: 17 Jun 2025↗ Security documentation

Sources and related pages

14 sources

Webhttps://ai.google.dev/ai.google.dev Webhttps://deepmind.google/technologies/gemini/deepmind.google Webhttps://deepmind.google/technologies/gemini/deepmind.google BlogGemini 2.5: Our newest Gemini model with thinking – Google DeepMind Blogblog.google DocsGemini 2.5 Pro – Gemini API | Google AI for Developersai.google.dev DocsGemini 2.5 Pro – Vertex AI | Google Cloud Documentationdocs.cloud.google.com DocsGemini Developer API Pricing – Google AI for Developersai.google.dev DocsGemini API Release Notes – Google AI for Developersai.google.dev ReportGemini 2.5 Technical Report (PDF) – Google DeepMindstorage.googleapis.com WebGemini 2.5 Pro – Google DeepMind Models Pagedeepmind.google BlogGemini 2.5 Updates: Flash/Pro GA, SFT, Flash-Lite on Vertex AI – Google Cloud Blogcloud.google.com BlogGoogle I/O 2025: Updates to Gemini 2.5 – Google DeepMind Blogblog.google BlogGemini 2.5 Pro Latest Preview – Google Blog (czerwiec 2025)blog.google WebGemini 2.5 Pro Model Card – Google Model Cardsmodelcards.withgoogle.com

Browse related topics

📁 Gemini 🌐 Chatbots 🌐 Document analysis 🌐 Data analysis All llm models All multimodal model models