Open-weight Mixture-of-Experts language model with 671B total parameters (37B activated per token), developed by DeepSeek AI and released in December 2024.
Context window
128K
tokens
Parameters
671B total, 37B activated
parameters
Max output
8,192
tokens
Release date
26 December 2024
Access:APIDownloadDeployment:☁ Cloud💻 Local
Overview
Access & deployment
APIDownload
CloudLocal
Weights: Open weights
Key parameters
📏 Context: 128K
🧩 Parameters: 671B total, 37B activated
✓ Tools · ✓ Fine-tuning
📥 Input: text
Platforms
Technical specification
Context window
128K
tokens
Parameters
671B total, 37B activated
parameters
Max output tokens
8,192
tokens per response
Knowledge cutoff
31 Jul 2024
Knowledge boundary
License
DeepSeek License v1.0
Hardware requirements
Local deployment requires server-class infrastructure/GPU; the model is also officially available as open weights and via the DeepSeek API. The repository includes local inference instructions.
Features:✓ Tool use✓ Fine-tuning
Modalities
⬇ Input
text
⬆ Output
textcodestructured_datasummariesreports
Capabilities and applications
Native model capabilities
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Multi-step reasoning
Carrying out multi-step chains of reasoning across long, complex tasks.
Category: reasoning
Long context
Maintaining coherence and focus across very long input context.
Category: language
Coding
Generating, analysing and modifying source code.
Category: coding
Function Calling
Category: planning
Structured output
Producing data in structured formats such as JSON.
Category: structured_generation
Multilingual
Understanding and generating text in many languages.
Category: language
Planning
Forming and executing action plans for complex tasks.
Category: planning
Streaming output
Category: reasoning
Application domains
Benchmark results
9 benchmarks
MMLU-Pro
EM · chat model standard benchmarks
75.9%
📅 27 Dec 2024📄 DeepSeek-V3 Technical Report / GitHub repository
Score for DeepSeek-V3 from the benchmark table published alongside the repository and technical report.
GPQA-Diamond
Pass@1 · chat model standard benchmarks
59.1%
📅 27 Dec 2024📄 DeepSeek-V3 Technical Report / GitHub repository
Benchmark for scientific knowledge and reasoning.
HumanEval-Mul
Pass@1 · coding benchmark
82.6%
📅 27 Dec 2024📄 DeepSeek-V3 Technical Report / GitHub repository
Multilingual coding benchmark.
LiveCodeBench
Pass@1-COT · coding benchmark with chain-of-thought style evaluation
40.5%
📅 27 Dec 2024📄 DeepSeek-V3 Technical Report / GitHub repository
Result for the chat model variant.
MATH-500
EM · math benchmark
90.2%
📅 27 Dec 2024📄 DeepSeek-V3 Technical Report / GitHub repository
Mathematical benchmark for the chat model.
MMLU
Exact Match (EM) · Chat model evaluation
88.5%
📄 DeepSeek-V3 Technical Report (arXiv:2412.19437)
Result from the official DeepSeek-V3 technical report (December 2024).
HumanEval
Pass@1 · Chat model evaluation
82.6%
📄 DeepSeek-V3 Technical Report (arXiv:2412.19437)
Result from the official DeepSeek-V3 technical report.
AIME 2024
Pass@1 · Chat model evaluation
39.2%
📄 DeepSeek-V3 Technical Report (arXiv:2412.19437)
Result from the official DeepSeek-V3 technical report.
DROP
3-shot F1 · Chat model evaluation
91.6%
📄 DeepSeek-V3 Technical Report (arXiv:2412.19437)
Result from the official DeepSeek-V3 technical report.
Pricing
Technical architecture
Core Architecture
Training Techniques
Deployment and security
☁ Available on platforms
