Gemini 2.5 Pro is Google DeepMind's flagship reasoning model, generally available June 17, 2025. Built on Sparse MoE architecture, supports up to 1M token context, text/audio/image/video input, and integrated thinking mode.
Context window
do 1M tokenów
tokens
Parameters
nieujawnione
parameters
Max output
65,536
tokens
Release date
25 March 2025
Access:APIHostedDeployment:☁ Cloud
Overview
Applications
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
📏 Context: do 1M tokenów
🧩 Parameters: nieujawnione
✓ Tools
📥 Input: text, image, audio, video…
Technical specification
Context window
do 1M tokenów
tokens
Parameters
nieujawnione
parameters
Max output tokens
65,536
tokens per response
Knowledge cutoff
1 Jan 2025
Knowledge boundary
License
proprietary
Hardware requirements
Access via Google Cloud infrastructure (Vertex AI / Gemini API)
Features:✓ Tool use
Modalities
⬇ Input
textimageaudiovideodocumentsstructured_dataurls
⬆ Output
textcodestructured_datasummariesanalytical_reportsimage
Capabilities and applications
Native model capabilities
Reasoning
Category: reasoning
Multi-step reasoning
Category: reasoning
Long context
Category: reasoning
Coding
Category: coding
Function Calling
Category: planning
Structured output
Category: structured_generation
Audio understanding
Category: audio
Image understanding
Category: vision
Video Understanding
Category: video
Chart understanding
Category: vision
Diagram reasoning
Category: reasoning
OCR
Category: vision
Multilingual
Category: language
Planning
Category: planning
Streaming output
Category: reasoning
Interleaved Multimodal Input
Category: reasoning
Multimodal understanding
Category: multimodal
Application domains
Benchmark results
15 benchmarks
MMLU
accuracy · general knowledge benchmark
90%+%
📅 25 Mar 2025📄 Google DeepMind
Approximate result based on Google materials.
SWE-bench Verified
accuracy · Custom agent setup with multiple trajectories and model-based re-scoring. Model ID: gemini-2.5-pro-exp-03-25.
63.8%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025)
Result from Google's custom agent setup. At launch, scored higher than OpenAI o3-mini (61.0%) and lower than Claude 3.7 Sonnet (70.3%). Technical report (06-05 snapshot) records 67.2%.
GPQA Diamond
pass@1 · Single attempt (pass@1), no majority voting. Graduate-level STEM questions.
84.0%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025) / technical report gemini_v2_5_report.pdf
Highest score among compared models at launch. Grok 3 Beta: 80.2%, o3-mini: lower.
AIME 2025
pass@1 · Single-attempt evaluation (pass@1), no majority voting. American Invitational Mathematics Examination 2025.
86.7%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025) / technical report
Leaderboard score at launch. o3-mini: 86.5% (marginally lower). Results from matharena.ai.
AIME 2024
pass@1 · Single-attempt evaluation (pass@1), no majority voting.
92.0%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025) / technical report
Highest score among compared models at launch.
Humanity's Last Exam (bez narzędzi)
accuracy · No tool use. A multidisciplinary benchmark created by domain experts.
18.8%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 (blog.google, marzec 2025)
Highest score at launch without tools. o3-mini: 14.0%, Claude 3.7 Sonnet: 8.9%, DeepSeek R1: 8.6%.
LiveCodeBench v5
pass@1 · Results from livecodebench.github.io (10/1/2024–2/1/2025 in UI).
70.4%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf / DataCamp
Slightly below o3-mini (74.1%) and Grok 3 Beta (70.6%). Improved from 30.5% (Gemini 1.5 Pro) to 74.2% per the technical report.
Aider Polyglot (Whole File Editing)
pass_rate · Average of 3 runs. Multilingual code editing. Results from aider.chat/docs/leaderboards/.
74.0%
📅 25 Mar 2025📄 Google DeepMind – oficjalny blog Gemini 2.5 / technical report
Technical report score: 82.2% (newer snapshot 06-05). Launch score (03-25): 74.0%.
MMMU
pass@1 · Multimodal academic reasoning (texts, images, diagrams, maps).
81.7%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf / Medium (Mehul Gupta)
Highest pass@1 among compared models at launch.
MRCR v1 (128K context)
accuracy · Multi-Round Coreference Resolution – locating multiple needles within a 128K context.
91.5%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf
Results added March 26, 2025 as a blog update. In the 1M token version: 83.1%.
MRCR v1 (1M context)
accuracy · Multi-round coreference resolution using the full 1M token context window.
83.1%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf
The only model in the benchmark supporting the full 1M token context at launch.
Global MMLU Lite (multilingual)
accuracy · Multilingual and multidisciplinary text comprehension.
89.8%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf
Highest score among compared models at launch.
SimpleQA
accuracy · Short-form factual questions.
52.9%
📅 25 Mar 2025📄 Google DeepMind – technical report gemini_v2_5_report.pdf
GPT-4.5 scored 62.5% on this benchmark.
LMArena (Chatbot Arena)
Elo · Human preference ranking of AI responses. Score from the preview update (June 2025) prior to GA release.
1470points
📅 1 Jun 2025📄 Google DeepMind – blog.google (czerwiec 2025 preview update)
Leaderboard leader following the preview update. An increase of 24 Elo points relative to the May version.
WebDev Arena
Elo · Web development ranking. Increase of 35 Elo points.
1443points
📅 1 Jun 2025📄 Google DeepMind – blog.google (czerwiec 2025 preview update)
Leads the WebDev Arena leaderboard following the preview update (June 2025).
Pricing
Deployment and security
🔒 Security / Enterprise
✓ Verified enterprise information
Model evaluated for cybersecurity, CBRN, autonomy, and other risks in accordance with Google DeepMind's Responsible Scaling Policy. Detailed safety assessments are included in the technical report and model card. Advanced mitigations against indirect prompt injection have been implemented.
The technical report includes full safety evaluations covering cybersecurity, CBRN, Machine Learning R&D, and Deceptive Alignment. A model card is available at modelcards.withgoogle.com. At Google I/O 2025, Google announced significant improvements to protection against indirect prompt injection attacks, describing Gemini 2.5 as the "most secure model family to date". Deep Think mode underwent additional safety evaluations before broad release. Training data was subject to safety filtering. The paid API tier does not use data for model training, unlike the free tier.
Updated: 17 Jun 2025↗ Security documentation
Sources and related pages
14 sources
Webhttps://ai.google.dev/Webhttps://deepmind.google/technologies/gemini/Webhttps://deepmind.google/technologies/gemini/BlogGemini 2.5: Our newest Gemini model with thinking – Google DeepMind BlogDocsGemini 2.5 Pro – Gemini API | Google AI for DevelopersDocsGemini 2.5 Pro – Vertex AI | Google Cloud DocumentationDocsGemini Developer API Pricing – Google AI for DevelopersDocsGemini API Release Notes – Google AI for DevelopersReportGemini 2.5 Technical Report (PDF) – Google DeepMindWebGemini 2.5 Pro – Google DeepMind Models PageBlogGemini 2.5 Updates: Flash/Pro GA, SFT, Flash-Lite on Vertex AI – Google Cloud BlogBlogGoogle I/O 2025: Updates to Gemini 2.5 – Google DeepMind BlogBlogGemini 2.5 Pro Latest Preview – Google Blog (czerwiec 2025)WebGemini 2.5 Pro Model Card – Google Model Cards
