GLM-5.1

5.1 · Family: GLM

Flagship open-source agentic model from Z.ai (Zhipu AI), based on a Mixture-of-Experts architecture with 744B total parameters and 40B active parameters.

✓ Active✓ Public access⚖ Open weightsLLMReasoning model📁 GLM

Context window

200K

tokens

Parameters

744B total (40B active per token)

parameters

Max output

128

tokens

Release date

7 April 2026

🏢Z.aiProducer

Access:APIDownloadDeployment:💻 Local☁ Cloud

Overview

GLM-5.1 is the flagship next-generation open-source model from Z.ai (Zhipu AI), designed primarily for agentic engineering tasks and AI-assisted programming. It uses a Mixture-of-Experts (MoE) architecture with a DeepSeek Sparse Attention (DSA) mechanism — 744B parameters in total, with 40B active per token (256 routed experts + 1 shared, 8 active per token).

Compared to its predecessor (GLM-5), the model shows substantial improvements on agentic benchmarks: SWE-Bench Pro (58.4%), BrowseComp (68.0%), and CyberGym (68.7%). A key characteristic of GLM-5.1 is its ability to operate effectively over long horizons — the model independently plans, runs experiments, analyzes results, and iterates on its strategy across hundreds of rounds and thousands of tool calls. It supports thinking mode (chain-of-thought reasoning, enabled by default), function calling, and code generation.

The model is released under the MIT license, with weights available for download on Hugging Face (zai-org/GLM-5.1) and ModelScope (ZhipuAI/GLM-5.1). Local deployment requires a GPU cluster (at least 8× H800/A100-class GPUs; supported via vLLM, SGLang, KTransformers, and xLLM). It is also available as an API service through the Z.ai platform. The weights were trained on Huawei Ascend 910B hardware.

Classification

LLMReasoning model

Family: GLM

Applications

Coding Research assistance Q&A / Question answering

Access & deployment

APIDownload

LocalCloud

Weights: Open weights

Key parameters

📏 Context: 200K

🧩 Parameters: 744B total (40B active per token)

✓ Tools

📥 Input: text

Platforms

Hugging Face Hub

Technical specification

Context window

200K

tokens

Parameters

744B total (40B active per token)

parameters

Max output tokens

128

tokens per response

License

MIT

Hardware requirements

Trained on Huawei Ascend 910B (no Nvidia). Local deployment requires an enterprise GPU cluster. Full BF16 model ~1.49 TB.

Features:✓ Tool use

Modalities

⬇ Input

text

⬆ Output

textcodestructured_data

Capabilities and applications

Native model capabilities

Coding

Generating, analysing and modifying source code.

Category: coding

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Multilingual

Understanding and generating text in many languages.

Category: language

Planning

Forming and executing action plans for complex tasks.

Category: planning

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Function Calling

Category: planning

Long context

Maintaining coherence and focus across very long input context.

Category: language

Streaming output

Category: reasoning

Application domains

Coding Research assistance Q&A / Question answering

Benchmark results

7 benchmarks

SWE-bench

58.4%

📅 7 Apr 2026📄 Z.ai (self-reported)

Self-reported by Z.ai. Ranked first among all models on SWE-Bench Pro at the date of publication. Result has not been independently verified.

GPQA

86.2%

📅 7 Apr 2026📄 Z.ai (self-reported)

Result self-reported by Z.ai from the official model card on HuggingFace.

HLE (Humanity's Last Exam)

accuracy · without tools

31.0%