Alibaba Qwen3-8B language model with 8.2B parameters, Apache 2.0. Hybrid thinking/non-thinking mode, 128K context, 119 language support, strong in math, coding and agent tasks.
Context window
128K
tokens
Parameters
8.2B
parameters
Max output
32,768
tokens
Release date
29 April 2025
Access:DownloadAPIHostedDeployment:💻 Local☁ Cloud📱 On-device
Overview
Access & deployment
DownloadAPIHosted
LocalCloudOn-device
Weights: Open source
Key parameters
📏 Context: 128K
🧩 Parameters: 8.2B
✓ Tools · ✓ Fine-tuning
📥 Input: text
Technical specification
Context window
128K
tokens
Parameters
8.2B
parameters
Max output tokens
32,768
tokens per response
Knowledge cutoff
1 Apr 2025
Knowledge boundary
License
Apache 2.0
Hardware requirements
GPU with at least ~16 GB VRAM (BF16). Flash Attention 2 recommended. Supported: Transformers (>=4.51.0), vLLM (>=0.8.5), SGLang (>=0.4.6.post1), llama.cpp (>=b5401), Ollama.
Features:✓ Tool use✓ Fine-tuning
Modalities
⬇ Input
text
⬆ Output
textcode
Capabilities and applications
Native model capabilities
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Multi-step reasoning
Carrying out multi-step chains of reasoning across long, complex tasks.
Category: reasoning
Mathematical reasoning
The model's ability to solve mathematical tasks requiring multi-step reasoning — equations, proofs, combinatorics, geometry, calculus and competition-level problems.
Category: reasoning
Coding
Generating, analysing and modifying source code.
Category: coding
Multilingual
Understanding and generating text in many languages.
Category: language
Long context
Maintaining coherence and focus across very long input context.
Category: language
Agentic capability
The model's ability to autonomously plan and execute multi-step tasks by sequentially using tools, maintaining context, and adapting to intermediate results.
Category: planning
Planning
Forming and executing action plans for complex tasks.
Category: planning
Function Calling
Category: planning
Structured output
Producing data in structured formats such as JSON.
Category: structured_generation
Language modeling
Ability to predict subsequent tokens and generate coherent natural-language text based on the preceding context.
Category: language
Application domains
Benchmark results
12 benchmarks
MMMU
accuracy · 5-shot, base model Qwen3-8B-Base
76.89%
📅 14 May 2025📄 Qwen3 Technical Report, Table 6 (arXiv 2505.09388)
Score for the base model (Qwen3-8B-Base). Instruction model results available in the technical report.
MMLU-Pro
accuracy · 5-shot CoT, base model Qwen3-8B-Base
56.73%
📅 14 May 2025📄 Qwen3 Technical Report, Table 6 (arXiv 2505.09388)
GPQA
accuracy · 5-shot CoT, base model Qwen3-8B-Base
44.44%
📅 14 May 2025📄 Qwen3 Technical Report, Table 6 (arXiv 2505.09388)
MATH
accuracy · 4-shot CoT, base model Qwen3-8B-Base
60.80%
📅 14 May 2025📄 Qwen3 Technical Report, Table 6 (arXiv 2505.09388)
Full MATH benchmark (full dataset). MATH-500 subset in thinking mode may yield higher scores.
GSM8K
accuracy · 4-shot CoT, base model Qwen3-8B-Base
89.84%
📅 14 May 2025📄 Qwen3 Technical Report, Table 6 (arXiv 2505.09388)
MGSM
accuracy · 8-shot CoT, multilingual math, base model Qwen3-8B-Base
76.02%
📅 14 May 2025📄 Qwen3 Technical Report, Table 6 (arXiv 2505.09388)
SWE-bench
pass@1 · post-training, thinking mode
—%
📅 14 May 2025📄 Qwen3 Technical Report (arXiv 2505.09388) — patrz wyniki instruktu
Specific 8B score not separately published in available sources.
LiveCodeBench
pass@1 · post-training, thinking mode
—%
📅 14 May 2025📄 Qwen3 Technical Report (arXiv 2505.09388)
Flagship Qwen3-235B achieves 70.7. The 8B score is not separately published.
BFCL (Berkeley Function-Calling Leaderboard)
accuracy · post-training, function calling
—%
📅 14 May 2025📄 Qwen3 Technical Report (arXiv 2505.09388)
Flagship Qwen3-235B achieves 70.8. The 8B score is not separately published.
IFEval
accuracy · post-training, non-thinking mode
—%
📅 14 May 2025📄 Qwen3 Technical Report (arXiv 2505.09388)
The 8B score is not separately published in available sources.
AIME 2024
pass@1 · post-training, thinking mode
—%
📅 14 May 2025📄 Qwen3 Technical Report (arXiv 2505.09388)
Flagship Qwen3-235B achieves 85.7. The 8B score is not separately published.
AIME 2025
pass@1 · post-training, thinking mode
—%
📅 14 May 2025📄 Qwen3 Technical Report (arXiv 2505.09388)
Flagship Qwen3-235B achieves 81.5. The 8B score is not separately published.
Technical architecture
Core Architecture
Model Form
Training Techniques