Qwen3-8B

3-8B · Family: Qwen3

Alibaba Qwen3-8B language model with 8.2B parameters, Apache 2.0. Hybrid thinking/non-thinking mode, 128K context, 119 language support, strong in math, coding and agent tasks.

✓ Active✓ Public access⚖ Open sourceLLMReasoning modelTool-using model📁 Qwen3

Context window

128K

tokens

Parameters

8.2B

parameters

Max output

32,768

tokens

Release date

29 April 2025

🏢AlibabaProducer

Access:DownloadAPIHostedDeployment:💻 Local☁ Cloud📱 On-device

Overview

Qwen3-8B is a post-trained (instruct) language model from the Qwen3 family developed by the Qwen Team at Alibaba Group and released on 29 April 2025 under the Apache 2.0 licence. It is a dense model with 8.2 billion parameters (6.95B non-embedding), built on 36 Transformer layers with GQA (32 Q heads, 8 KV heads). The model is part of the Qwen3 series, which includes dense models from 0.6B to 32B and MoE models at 30B–235B.

Hybrid thinking mode

The key innovation of Qwen3 is support for two modes within a single model: thinking mode (enable_thinking=True) — the model generates step-by-step reasoning inside a <think>…</think> block, followed by the final answer; and non-thinking mode (enable_thinking=False) — a direct answer without explicit reasoning, similar to traditional chat models. The mode can be switched dynamically via /think and /no_think flags in the prompt or the enable_thinking parameter in the chat template. Recommended settings for thinking mode: Temperature=0.6, TopP=0.95, TopK=20. For non-thinking mode: Temperature=0.7, TopP=0.8, TopK=20.

Architecture and pretraining

The architecture is based on Transformer with GQA, SwiGLU, RoPE and RMSNorm (similar to Qwen2.5), but without QKV-bias and with added QK-Norm for training stability. The model natively supports 128K-token context (extendable to 131K with YaRN). Pre-training covered approximately 36 trillion tokens in three stages: a general stage (30T tokens), a STEM/code-heavy stage (5T tokens) and a long-context stage (up to 32K, extended to 128K). Extended synthetic data was generated using Qwen2.5-VL, Qwen2.5-Math and Qwen2.5-Coder models.

Post-training (4 stages)

Post-training of the model included 4 stages: (1) Long-CoT cold start — fine-tuning on CoT data from mathematics, code and STEM. (2) Reasoning RL — reinforcement learning with rule-based rewards (GRPO). (3) Thinking Mode Fusion — integration of the non-thinking mode via SFT on mixed CoT + instruction data. (4) General RL — RL across 20+ general tasks (instruction following, format compliance, agent tasks). Smaller models (including 8B) were trained via strong-to-weak distillation from larger models instead of the full 4-stage pipeline.

Multilingual support and agent capabilities

The model supports 119 languages and dialects (vs. 29 in Qwen2.5), covering Indo-European, Sino-Tibetan, Afro-Asiatic, Austronesian and other language families. In terms of agent capabilities, Qwen3-8B integrates well with external tools via MCP (Model Context Protocol), supporting function calls in both thinking and non-thinking modes. Qwen-Agent is the recommended agent framework.

Classification

LLMReasoning modelTool-using model

Family: Qwen3

Applications

Coding Research assistance Data analysis Chatbots Creative writing Brainstorming

Access & deployment

DownloadAPIHosted

LocalCloudOn-device

Weights: Open source

Key parameters

📏 Context: 128K

🧩 Parameters: 8.2B

✓ Tools · ✓ Fine-tuning

📥 Input: text

Technical specification

Context window

128K

tokens

Parameters

8.2B

parameters

Max output tokens

32,768

tokens per response

Knowledge cutoff

1 Apr 2025

Knowledge boundary

License

Apache 2.0

Hardware requirements

GPU with at least ~16 GB VRAM (BF16). Flash Attention 2 recommended. Supported: Transformers (>=4.51.0), vLLM (>=0.8.5), SGLang (>=0.4.6.post1), llama.cpp (>=b5401), Ollama.

Features:✓ Tool use✓ Fine-tuning

Modalities

⬇ Input

text

⬆ Output

textcode

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Mathematical reasoning

The model's ability to solve mathematical tasks requiring multi-step reasoning — equations, proofs, combinatorics, geometry, calculus and competition-level problems.

Category: reasoning

Coding

Generating, analysing and modifying source code.

Category: coding

Multilingual

Understanding and generating text in many languages.

Category: language

Long context

Maintaining coherence and focus across very long input context.

Category: language

Agentic capability

The model's ability to autonomously plan and execute multi-step tasks by sequentially using tools, maintaining context, and adapting to intermediate results.

Category: planning

Planning

Forming and executing action plans for complex tasks.

Category: planning

Function Calling

Category: planning

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Language modeling

Ability to predict subsequent tokens and generate coherent natural-language text based on the preceding context.

Category: language

Application domains

Coding Research assistance Data analysis Chatbots Creative writing Brainstorming

Benchmark results

12 benchmarks

MMMU

accuracy · 5-shot, base model Qwen3-8B-Base

76.89%