DeepSeek V3

V3 · Family: DeepSeek

Open-weight Mixture-of-Experts language model with 671B total parameters (37B activated per token), developed by DeepSeek AI and released in December 2024.

✓ Active✓ Public access⚖ Open weightsLLM📁 DeepSeek

Context window

128K

tokens

Parameters

671B total, 37B activated

parameters

Max output

8,192

tokens

Release date

26 December 2024

🏢DeepSeek AIProducer

Access:APIDownloadDeployment:☁ Cloud💻 Local

Overview

DeepSeek-V3 is an open-weights Mixture-of-Experts (MoE) language model developed by DeepSeek AI, released on December 26, 2024.

Architecture and Specifications

The model has 671 billion total parameters, of which 37 billion are activated per token via the Mixture-of-Experts architecture. Context window: 128,000 tokens. Maximum output tokens: 8,192. Knowledge cutoff: July 2024. Model weights are publicly available on GitHub and Hugging Face, along with local inference instructions.

Benchmark Results

According to the official technical report (arXiv:2412.19437): MMLU 88.5%, MMLU-Pro 75.9%, GPQA Diamond 59.1%, MATH-500 90.2%, HumanEval 82.6%, LiveCodeBench 40.5%, AIME 2024 39.2%, DROP (3-shot F1) 91.6%.

Availability and Pricing

The model is available via the DeepSeek API (api-docs.deepseek.com) and for self-hosting from weights published on Hugging Face. Prices at launch: $0.07/MTok (cache hit), $0.27/MTok (cache miss) for input, and $1.10/MTok for output. As of December 2025, the deepseek-chat endpoint points to DeepSeek-V3.2.

Classification

LLM

Family: DeepSeek

Applications

Chatbots Data analysis Document analysis Summarization Translation

Access & deployment

APIDownload

CloudLocal

Weights: Open weights

Key parameters

📏 Context: 128K

🧩 Parameters: 671B total, 37B activated

✓ Tools · ✓ Fine-tuning

📥 Input: text

Platforms

Hugging Face Hub

Technical specification

Context window

128K

tokens

Parameters

671B total, 37B activated

parameters

Max output tokens

8,192

tokens per response

Knowledge cutoff

31 Jul 2024

Knowledge boundary

License

DeepSeek License v1.0

Hardware requirements

Local deployment requires server-class infrastructure/GPU; the model is also officially available as open weights and via the DeepSeek API. The repository includes local inference instructions.

Features:✓ Tool use✓ Fine-tuning

Modalities

⬇ Input

text

⬆ Output

textcodestructured_datasummariesreports

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Long context

Maintaining coherence and focus across very long input context.

Category: language

Coding

Generating, analysing and modifying source code.

Category: coding

Function Calling

Category: planning

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Multilingual

Understanding and generating text in many languages.

Category: language

Planning

Forming and executing action plans for complex tasks.

Category: planning

Streaming output

Category: reasoning

Application domains

Chatbots Data analysis Document analysis Summarization Translation

Benchmark results

9 benchmarks

MMLU-Pro

EM · chat model standard benchmarks

75.9%