Robots Atlas>ROBOTS ATLAS
Mistral Large 3

Mistral Large 3

v25.12ย ยทย Family: Mistral
Open-weight multimodal model from Mistral AI (December 2025), built on a Mixture-of-Experts architecture with 41B active and 675B total parameters, 256k context, Apache 2.0 license.
โœ“ Activeโœ“ Public accessโš– Open sourceLLMMultimodal๐Ÿ“ Mistral
Context window
256k
tokens
Parameters
675B total / 41B active
parameters
Release date
2 December 2025
Access:APIDownloadHostedDeployment:โ˜ Cloud๐Ÿ’ป Local

Overview

Mistral Large 3 is a multimodal language model announced by Mistral AI on December 2, 2025, as part of the Mistral 3 family. The model uses a sparse Mixture-of-Experts (MoE) architecture with 41B active and 675B total parameters, and a 256k context window.

It was trained from scratch on 3,000 NVIDIA H200 GPUs and is Mistral AIโ€™s first MoE model since the Mixtral series. Both base and instruction-fine-tuned versions are released under the Apache 2.0 license. The model supports image understanding (multimodal input) and multilingual conversations.

Mistral Large 3 is available through Mistral AI Studio, Amazon Bedrock, Azure AI Foundry, Hugging Face, IBM WatsonX, OpenRouter, Fireworks, and Together AI. The model weights can run on a single 8ร—A100 or 8ร—H100 node with vLLM, and in NVFP4 format on Blackwell NVL72 systems.

Classification
LLMMultimodal
Family: Mistral
Access & deployment
APIDownloadHosted
CloudLocal
Weights: Open source
Key parameters
๐Ÿ“ Context: 256k
๐Ÿงฉ Parameters: 675B total / 41B active
๐Ÿ“ฅ Input: text, image

Technical specification

Context window
256k
tokens
Parameters
675B total / 41B active
parameters
License
Apache 2.0
Modalities
โฌ‡ Input
textimage
โฌ† Output
textcode

Capabilities and applications

Native model capabilities
Image understanding
Analysing and interpreting the content of images.
Category: vision
Multilingual
Understanding and generating text in many languages.
Category: language
Multimodal understanding
Category: multimodal
Function Calling
Category: planning
Structured output
Producing data in structured formats such as JSON.
Category: structured_generation
Long context
Maintaining coherence and focus across very long input context.
Category: language
OCR
Recognising text within images and documents.
Category: vision

Technical architecture

Core Architecture
Training Techniques