Grok-2

2 · Family: Grok

xAI second-generation flagship model with multimodal capabilities and image generation via FLUX by Black Forest Labs. Weights available on HuggingFace (~500 GB, 42 files). Requires 8× GPU with >40 GB memory each.

⚠ Deprecated✓ Public access⚖ Open weightsLLMMultimodal📁 Grok

Context window

131K

tokens

Parameters

nieujawnione

parameters

Release date

20 August 2024

🏢SpaceXAIProducer

Access:APIDownloadHostedDeployment:☁ Cloud💻 Local

Overview

Grok-2 is xAI's multimodal frontier language model announced on 13 August 2024 and rolled out to X Premium and Premium+ subscribers. In xAI's official August 2024 benchmarks it achieved GPQA 56.0%, MMLU 87.5%, MMLU-Pro 75.5%, MATH 76.1%, HumanEval 88.4%, MMMU 66.1%, MathVista 69.0%, and DocVQA 93.6%. An early version tested on the LMSYS Chatbot Arena under the codename "sus-column-r" outperformed Claude 3.5 Sonnet and GPT-4-Turbo on overall Elo at the time. The model integrates image generation through a Black Forest Labs FLUX.1 partnership. In August 2025 xAI released the Grok-2 weights on Hugging Face under the xAI Community License Agreement (source-available, with commercial-use restrictions) — the checkpoint is ~500 GB across 42 files and requires 8 GPUs with >40 GB each (TP=8, FP8 quantization). The exact parameter count has not been officially disclosed by xAI.

Classification

LLMMultimodal

Family: Grok

Access & deployment

APIDownloadHosted

CloudLocal

Weights: Open weights

Key parameters

📏 Context: 131K

🧩 Parameters: nieujawnione

📥 Input: text, image

Technical specification

Context window

131K

tokens

Parameters

nieujawnione

parameters

License

xAI Community License Agreement

Modalities

⬇ Input

textimage

⬆ Output

textimage

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Coding

Generating, analysing and modifying code in many programming languages. Covers writing functions, debugging, refactoring, code review, and creating tests. Measured by benchmarks such as HumanEval and SWE-bench.

Category: coding

Image understanding

Analysing and interpreting the content of images.

Category: vision

Multilingual

Competence in many natural languages (from a few to over a hundred): understanding, generation, translation, and code-switching within a single conversation. Frontier models support a wide range of languages with comparable quality.

Category: language

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Benchmark results

8 benchmarks

GPQA

0-shot CoT (xAI eval, Aug 2024)

56.0%