Kimi K2.6

K2.6

Moonshot AI's open native multimodal agentic MoE model with 1 trillion total parameters (32B active), 256K context window, and native INT4 quantization.

✓ Active✓ Public access⚖ Open weightsMultimodalReasoning modelTool-using model

Context window

256K

tokens

Parameters

1T total / 32B active

parameters

Max output

98,304

tokens

Release date

21 April 2026

🏢Moonshot AIProducer

Access:APIDownloadHostedDeployment:☁ Cloud💻 Local

Overview

Kimi K2.6 is an open, native multimodal agentic model created by Moonshot AI, released in April 2026. It builds on the architecture and approach established by Kimi K2.5, extending capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration.

Key Features

Long-Horizon Coding — significant improvements on complex, end-to-end coding tasks across Rust, Go, Python and domains including front-end, DevOps and performance optimization.
Coding-Driven Design — transforms simple prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows.
Elevated Agent Swarm — horizontal scaling to 300 sub-agents executing 4,000 coordinated steps, dynamically decomposing tasks into parallel, domain-specialized subtasks.
Proactive & Open Orchestration — supports persistent, 24/7 background agents that proactively manage schedules, execute code and orchestrate cross-platform operations.

Architecture

Kimi K2.6 is a Mixture-of-Experts model with 1T total parameters and 32B active parameters across 61 layers (1 dense), 384 experts per MoE layer, 8 experts selected per token, and 1 shared expert. The attention mechanism is MLA (Multi-head Latent Attention) with 64 heads and 7168 hidden dimension. Activation: SwiGLU. Vocabulary: 160K. Context: 256K tokens. Vision encoder: MoonViT (400M parameters). The model uses native INT4 quantization (same as Kimi K2 Thinking).

Modes and Access

The model supports Thinking mode (recommended temperature 1.0) and Instant mode (recommended temperature 0.6), as well as preserve_thinking (retains reasoning across turns in coding-agent scenarios). Weights are released under the Modified MIT License. The API is available at platform.moonshot.ai (OpenAI and Anthropic compatible). Recommended inference engines: vLLM, SGLang, KTransformers. Dedicated coding-agent framework: Kimi Code CLI (kimi.com/code).

Classification

MultimodalReasoning modelTool-using model

Access & deployment

APIDownloadHosted

CloudLocal

Weights: Open weights

Key parameters

📏 Context: 256K

🧩 Parameters: 1T total / 32B active

✓ Tools · ✓ Fine-tuning

📥 Input: text, image, video

Technical specification

Context window

256K

tokens

Parameters

1T total / 32B active

parameters

Max output tokens

98,304

tokens per response

License

Modified MIT License

Hardware requirements

Recommended inference engines: vLLM, SGLang, KTransformers. Requires transformers >=4.57.1, <5.0.0. Weights provided as safetensors / compressed-tensors with native INT4 quantization.

Features:✓ Tool use✓ Fine-tuning

Modalities

⬇ Input

textimagevideo

⬆ Output

textcode

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Coding

Generating, analysing and modifying code in many programming languages. Covers writing functions, debugging, refactoring, code review, and creating tests. Measured by benchmarks such as HumanEval and SWE-bench.

Category: coding

Long context

Support for large context windows — tens to hundreds of thousands (or millions) of input tokens. Enables analysis of entire codebases, long documents, and many parallel conversations without losing earlier information. GPT-5.1 supports 400,000 tokens.

Category: language

Agentic capability

The model's ability to autonomously plan and execute multi-step tasks by sequentially using tools, maintaining context, and adapting to intermediate results.

Category: planning

Multimodal understanding

Category: multimodal

Image understanding

Analysing and interpreting the content of images.

Category: vision

Video understanding

The model's ability to analyse and interpret video content — recognising actions, motion, events and relationships between objects over time.

Category: video

Computer use

The model's ability to operate a computer interface by interpreting screenshots and generating actions such as clicks, typing, and navigating applications.

Category: planning

Parallel Tool Calls

Ability to invoke multiple external tools simultaneously while generating a response.

Category: reasoning

Planning

Forming and executing action plans for complex tasks.

Category: planning

Chart understanding

Reading and interpreting charts, tables and diagrams.

Category: vision

Multilingual

Competence in many natural languages (from a few to over a hundred): understanding, generation, translation, and code-switching within a single conversation. Frontier models support a wide range of languages with comparable quality.

Category: language

Vision encoder

The model's ability to encode images and video frames into dense representations (embeddings), used for downstream tasks or as a backbone for vision-language models.

Category: vision

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Benchmark results

13 benchmarks

Humanity's Last Exam (HLE)

accuracy · with tools (search, code-interpreter, web-browsing); HLE-Full

54.0%