Mixtral 8x7B

8x7B v0.1 · Family: Mistral

Open-weights sparse mixture-of-experts model from Mistral AI: 46.7B total parameters (12.9B active per token), 32K context window, Apache 2.0 license.

⚠ Deprecated✓ Public access⚖ Open sourceLLM📁 Mistral

Context window

32K

tokens

Parameters

46.7B total / 12.9B active

parameters

Release date

11 December 2023

🏢Mistral AIProducer

Access:APIDownloadDeployment:💻 Local☁ Cloud

Overview

Mixtral 8x7B is a decoder-only Sparse Mixture-of-Experts (SMoE) language model released by Mistral AI on December 11, 2023 under the Apache 2.0 license. At every layer and for every token, a router network selects 2 of 8 expert groups in the feed-forward block and combines their outputs additively. This yields 46.7B total parameters while only ~12.9B are active per token, keeping inference cost and latency comparable to a 12.9B model.

The model supports a 32k token context window and five languages: English, French, Italian, German and Spanish. Its Instruct variant, fine-tuned with SFT and DPO, scores 8.30 on MT-Bench. Mixtral 8x7B was distributed both as downloadable weights and via the Mistral API as open-mixtral-8x7b. It was marked deprecated on November 30, 2024 and retired from the Mistral API on March 30, 2025.

Classification

LLM

Family: Mistral

Access & deployment

APIDownload

LocalCloud

Weights: Open source

Key parameters

📏 Context: 32K

🧩 Parameters: 46.7B total / 12.9B active

✓ Fine-tuning

📥 Input: text

Technical specification

Context window

32K

tokens

Parameters

46.7B total / 12.9B active

parameters

License

Apache 2.0

Features:✓ Fine-tuning

Modalities

⬇ Input

text

⬆ Output

textcode

Capabilities and applications

Native model capabilities

Language modeling

Ability to predict subsequent tokens and generate coherent natural-language text based on the preceding context.

Category: language

Coding

Generating, analysing and modifying code in many programming languages. Covers writing functions, debugging, refactoring, code review, and creating tests. Measured by benchmarks such as HumanEval and SWE-bench.

Category: coding

Multilingual

Competence in many natural languages (from a few to over a hundred): understanding, generation, translation, and code-switching within a single conversation. Frontier models support a wide range of languages with comparable quality.

Category: language

Long context

Support for large context windows — tens to hundreds of thousands (or millions) of input tokens. Enables analysis of entire codebases, long documents, and many parallel conversations without losing earlier information. GPT-5.1 supports 400,000 tokens.

Category: language

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning