Gemini 3.1 Flash-Lite is the most cost-efficient thinking model in Google DeepMind's Gemini 3 series, designed for high throughput and low latency while retaining reasoning quality.
Context window
1M
tokens
Max output
65,536
tokens
Release date
29 April 2026
Access:APIHostedDeployment:☁ Cloud
Overview
Applications
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
📏 Context: 1M
✓ Tools
📥 Input: text, image, audio, video…
Platforms
Technical specification
Context window
1M
tokens
Max output tokens
65,536
tokens per response
Knowledge cutoff
1 Jan 2025
Knowledge boundary
License
proprietary
Hardware requirements
Available only through Google cloud infrastructure (Gemini API, Vertex AI, Google AI Studio).
Features:✓ Tool use
Modalities
⬇ Input
textimageaudiovideodocuments
⬆ Output
textcode
Capabilities and applications
Native model capabilities
Reasoning
Category: reasoning
Multi-step reasoning
Category: reasoning
Long context
Category: reasoning
Multimodal understanding
Category: multimodal
Coding
Category: coding
Function Calling
Category: planning
Structured output
Category: structured_generation
Audio understanding
Category: audio
Image understanding
Category: vision
Video Understanding
Category: video
Chart understanding
Category: vision
Multilingual
Category: language
Streaming output
Category: reasoning
Benchmark results
11 benchmarks
Humanity's Last Exam
accuracy · No tools, Gemini 3.1 Flash-Lite High
16.0%%
📄 https://deepmind.google/models/gemini/flash-lite/
Full set (text + MM). No tools.
GPQA Diamond
accuracy · No tools, Gemini 3.1 Flash-Lite High
86.9%%
📄 https://deepmind.google/models/gemini/flash-lite/
Scientific knowledge, no tools.
MMMU-Pro
accuracy · No tools, Gemini 3.1 Flash-Lite High
76.8%%
📄 https://deepmind.google/models/gemini/flash-lite/
Multimodal understanding and reasoning.
CharXiv Reasoning
accuracy · Gemini 3.1 Flash-Lite High
73.2%%
📄 https://deepmind.google/models/gemini/flash-lite/
Information synthesis from complex charts.
Video-MMMU
accuracy · Gemini 3.1 Flash-Lite High
84.8%%
📄 https://deepmind.google/models/gemini/flash-lite/
Knowledge acquisition from videos.
SimpleQA Verified
accuracy · Gemini 3.1 Flash-Lite High
43.3%%
📄 https://deepmind.google/models/gemini/flash-lite/
Parametric knowledge.
FACTS Benchmark Suite
accuracy · Gemini 3.1 Flash-Lite High
40.6%%
📄 https://deepmind.google/models/gemini/flash-lite/
Factuality across grounding, parametric knowledge, search, and MM.
MMMLU
accuracy · Gemini 3.1 Flash-Lite High
88.9%%
📄 https://deepmind.google/models/gemini/flash-lite/
Multilingual Q&A.
LiveCodeBench
accuracy · UI: 1/1/2025-5/1/2025, Gemini 3.1 Flash-Lite High
72.0%%
📄 https://deepmind.google/models/gemini/flash-lite/
Code generation.
MRCR v2 (8-needle, 128k)
accuracy · 128k average, Gemini 3.1 Flash-Lite High
60.1%%
📄 https://deepmind.google/models/gemini/flash-lite/
Long context performance.
MRCR v2 (8-needle, 1M)
accuracy · 1M pointwise, Gemini 3.1 Flash-Lite High
12.3%%
📄 https://deepmind.google/models/gemini/flash-lite/
Very long context performance (1M tokens).
Pricing
Technical architecture
Core Architecture
Model Form
Training Techniques
Deployment and security
☁ Available on platforms
🔒 Security / Enterprise
✓ Verified enterprise information
Model card dostępny publicznie.
Updated: 1 May 2026↗ Security documentation
