Claude Opus 4.6

4.6 · Family: Claude

Anthropic's flagship language model from the Claude 4 family, released February 5, 2026, featuring a 1M token context window, adaptive reasoning, and advanced agentic capabilities.

✓ Active✓ Public accessLLMMultimodalReasoning modelTool-using model📁 Claude

Context window

tokens

Max output

128,000

tokens

Release date

5 February 2026

🏢AnthropicProducer

Access:APIHostedDeployment:☁ Cloud

Overview

Claude Opus 4.6 is the flagship language model from Anthropic, released on February 5, 2026. It is the direct successor to Claude Opus 4.5 and belongs to the Claude 4 model family. The API model identifier is claude-opus-4-6. It was Anthropic's most capable publicly available model until the release of Claude Opus 4.7 in April 2026.

Key improvements over Opus 4.5

Opus 4.6 introduces a 1 million token context window as the first model in the Opus class to do so, alongside a maximum output of 128,000 tokens. The model plans more carefully, sustains agentic tasks over longer horizons, operates more reliably across large code repositories, and better detects its own errors during code review and debugging. A notable advancement is a qualitative leap in retrieving information from long contexts: on the MRCR v2 benchmark in the 8-needle variant at 1M tokens, Opus 4.6 scores 76%, compared to just 18.5% for Sonnet 4.5.

The model introduces adaptive thinking, in which the model itself determines — based on context — how intensively to engage extended reasoning. Four effort levels are available: low, medium, high (default), and max. A new context compaction mechanism automatically summarizes older portions of a conversation server-side, enabling effectively unlimited agentic session lengths.

New product features

Claude Code introduces agent teams, enabling multiple independent Claude instances to work in parallel on different parts of a project with peer-to-peer communication via the Mailbox Protocol. Available as a preview, Claude in PowerPoint reads slide layouts, fonts, and master templates to generate presentations consistent with an organization's visual identity. Claude in Excel received significant improvements. A fast mode is also available in research preview for Opus 4.6, accelerating output token generation up to 2.5× faster at 6× the price (30 USD / 150 USD per million input/output tokens).

Safety and compliance

According to Anthropic, Opus 4.6 exhibits an overall safety profile at least as strong as its predecessor, Claude Opus 4.5, with low rates of non-compliant behavior including deception, sycophancy, and facilitation of misuse. It achieves the lowest over-refusal rate among recently released Claude models. A full safety evaluation is available in the official Claude Opus 4.6 system card.

Classification

LLMMultimodalReasoning modelTool-using model

Family: Claude

Access & deployment

APIHosted

Cloud

Weights: Closed

Key parameters

📏 Context: 1M

✓ Tools

📥 Input: text, image, documents

Technical specification

Context window

tokens

Max output tokens

128,000

tokens per response

Knowledge cutoff

31 May 2025

Knowledge boundary

License

Komercyjna (zastrzeżona, model zamknięty)

Hardware requirements

Closed model, available exclusively via API. Local deployment is not supported. Available through: Claude API (Anthropic), Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry.

Features:✓ Tool use

Modalities

⬇ Input

textimagedocuments

⬆ Output

textcodestructured_data

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Long context

Maintaining coherence and focus across very long input context.

Category: language

Coding

Generating, analysing and modifying source code.

Category: coding

Function Calling

Category: planning

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Image understanding

Analysing and interpreting the content of images.

Category: vision

Chart understanding

Reading and interpreting charts, tables and diagrams.

Category: vision

OCR

Recognising text within images and documents.

Category: vision

Multilingual

Understanding and generating text in many languages.

Category: language

Planning

Forming and executing action plans for complex tasks.

Category: planning

Streaming output

Category: reasoning

Benchmark results

13 benchmarks

Terminal-Bench 2.0

pass@1 · Agentic coding and system operations benchmark in a terminal environment

65.4%

📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie, 5 lutego 2026

Highest score among frontier models at launch, according to Anthropic. Increase from 59.3% (Opus 4.5).

Humanity's Last Exam (bez narzędzi)

accuracy · Multidisciplinary academic knowledge benchmark at frontier level; no access to external tools

40.0%

📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026

Increase from 30.8% (Opus 4.5). First place among frontier models at launch. Score updated on 23 February 2026 to 40.0% following improvements to fraud detection.

Humanity's Last Exam (z narzędziami)

accuracy · Multidisciplinary academic knowledge benchmark with access to external tools

53.0%

📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026; zaktualizowane 23 lutego 2026

Result originally reported as 53.1%, corrected to 53.0% on February 23, 2026, following the deployment of an improved fraud-detection pipeline (3 additional cases excluded).

SWE-bench Verified

pass@1 · Verified subset of 500 real GitHub issues

80.8%

📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026

Slight drop from 80.9% (Opus 4.5); Anthropic focused optimization efforts on other areas.

GDPval-AA (Praca merytoryczna)

Wynik Elo · Artificial Analysis benchmark measuring the economic value of substantive work in finance and law domains.

1606Elo

📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie, 5 lutego 2026

Outperforms GPT-5.2 by approximately 144 Elo points and Opus 4.5 by 190 points.

BrowseComp

accuracy · Benchmark measuring the model's ability to locate hard-to-find information on the Internet; multi-agent configuration at max effort

84.0%

📅 5 Feb 2026📄 Vellum AI / Anthropic — oficjalne ogłoszenie, 5 lutego 2026

Best result among frontier models at launch, according to Anthropic.

ARC AGI 2

accuracy · Abstract reasoning benchmark; max effort used with a thinking budget of 120k.

68.8%

📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026

Increase from 37.6% (Opus 4.5) — one of the largest jumps on this benchmark in the history of frontier model updates. Surpasses GPT-5.2 Pro (54.2%) and Gemini 3 Pro (45.1%).

GPQA Diamond

accuracy · Doctoral-level scientific knowledge benchmark

91.3%

📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026

Increase from 87.0% (Opus 4.5).

OSWorld

accuracy · Benchmark for agentic graphical interface control (computer use)

72.7%

📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026

Increase from approximately 66.3% (Opus 4.5).

MRCR v2 (8 igieł, 1M tokenów)

accuracy · Benchmark for retrieving multiple facts hidden within a very long text; 8-needle variant at 1M tokens

76.0%

📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie, 5 lutego 2026

Score increase from 18.5% (Sonnet 4.5). Opus 4.6 reaches 93% at 256K tokens.

Finance Agent

accuracy · Multi-step financial analysis benchmark

60.7%

📅 5 Feb 2026📄 Vellum AI / Anthropic — oficjalne ogłoszenie, 5 lutego 2026

Leads among compared models at launch date.

BigLaw Bench (Harvey)

accuracy · Legal benchmark on BigLaw tasks; 40% perfect scores, 84% above 0.8.

90.2%

📅 5 Feb 2026📄 Harvey / oficjalne ogłoszenie Anthropic, 5 lutego 2026

Highest BigLaw Bench score among Claude models at launch.

MCP Atlas

accuracy · Multi-step, scaled tool use benchmark

75.8%

📅 5 Feb 2026📄 Anthropic — karta systemowa, 5 lutego 2026; zrewidowane przez Scale AI

Score updated by Scale AI following a change in evaluation methodology (originally 59.5%). Opus 4.7 achieved 77.3% on this benchmark.

Pricing

Deployment and security

🔒 Security / Enterprise

✓ Verified enterprise information

Claude Opus 4.6 demonstrates a safety profile at least as strong as Opus 4.5, with low rates of policy-violating behavior. On Amazon Bedrock, a zero operator access policy applies. Data residency options are available via the inference_geo parameter (1.1× multiplier for US-only inference). Enterprise security and compliance details: trust.anthropic.com.

Full safety evaluation is available in the Claude Opus 4.6 system card at anthropic.com/claude-opus-4-6-system-card. Opus 4.6 achieves the lowest over-refusal rate among recent Claude models. Enhanced planning capabilities may theoretically increase the potential for obfuscation in cases of misalignment — this issue is discussed in the system card.

Updated: 5 Feb 2026↗ Security documentation

Sources and related pages

7 sources

BlogIntroducing Claude Opus 4.6 — Anthropicanthropic.com DocsModels overview — Claude API Docsplatform.claude.com DocsPricing — Claude API Docsplatform.claude.com DocsClaude Platform release notes — Claude API Docsplatform.claude.com DocsContext windows — Claude API Docsplatform.claude.com DocsHow up-to-date is Claude's training data — Claude Help Centersupport.claude.com WebClaude Opus 4.6 — strona produktu Anthropicanthropic.com

Browse related topics

📁 Claude All llm models All multimodal model models