Robots Atlas>ROBOTS ATLAS
Claude Opus 4.6

Claude Opus 4.6

4.6 · Family: Claude
Anthropic's flagship language model from the Claude 4 family, released February 5, 2026, featuring a 1M token context window, adaptive reasoning, and advanced agentic capabilities.
✓ Active✓ Public accessLLMMultimodalReasoning modelTool-using model📁 Claude
Context window
1M
tokens
Max output
128,000
tokens
Release date
5 February 2026
Access:APIHostedDeployment:☁ Cloud

Overview

Claude Opus 4.6 is the flagship language model from Anthropic, released on February 5, 2026. It is the direct successor to Claude Opus 4.5 and belongs to the Claude 4 model family. The API model identifier is claude-opus-4-6. It was Anthropic's most capable publicly available model until the release of Claude Opus 4.7 in April 2026.

Key improvements over Opus 4.5

Opus 4.6 introduces a 1 million token context window as the first model in the Opus class to do so, alongside a maximum output of 128,000 tokens. The model plans more carefully, sustains agentic tasks over longer horizons, operates more reliably across large code repositories, and better detects its own errors during code review and debugging. A notable advancement is a qualitative leap in retrieving information from long contexts: on the MRCR v2 benchmark in the 8-needle variant at 1M tokens, Opus 4.6 scores 76%, compared to just 18.5% for Sonnet 4.5.

The model introduces adaptive thinking, in which the model itself determines — based on context — how intensively to engage extended reasoning. Four effort levels are available: low, medium, high (default), and max. A new context compaction mechanism automatically summarizes older portions of a conversation server-side, enabling effectively unlimited agentic session lengths.

New product features

Claude Code introduces agent teams, enabling multiple independent Claude instances to work in parallel on different parts of a project with peer-to-peer communication via the Mailbox Protocol. Available as a preview, Claude in PowerPoint reads slide layouts, fonts, and master templates to generate presentations consistent with an organization's visual identity. Claude in Excel received significant improvements. A fast mode is also available in research preview for Opus 4.6, accelerating output token generation up to 2.5× faster at 6× the price (30 USD / 150 USD per million input/output tokens).

Safety and compliance

According to Anthropic, Opus 4.6 exhibits an overall safety profile at least as strong as its predecessor, Claude Opus 4.5, with low rates of non-compliant behavior including deception, sycophancy, and facilitation of misuse. It achieves the lowest over-refusal rate among recently released Claude models. A full safety evaluation is available in the official Claude Opus 4.6 system card.

Classification
LLMMultimodalReasoning modelTool-using model
Family: Claude
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
📏 Context: 1M
Tools
📥 Input: text, image, documents

Technical specification

Context window
1M
tokens
Max output tokens
128,000
tokens per response
Knowledge cutoff
31 May 2025
Knowledge boundary
License
Komercyjna (zastrzeżona, model zamknięty)
Hardware requirements
Closed model, available exclusively via API. Local deployment is not supported. Available through: Claude API (Anthropic), Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry.
Features:Tool use
Modalities
⬇ Input
textimagedocuments
⬆ Output
textcodestructured_data

Capabilities and applications

Native model capabilities
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Multi-step reasoning
Carrying out multi-step chains of reasoning across long, complex tasks.
Category: reasoning
Long context
Maintaining coherence and focus across very long input context.
Category: language
Coding
Generating, analysing and modifying source code.
Category: coding
Function Calling
Category: planning
Structured output
Producing data in structured formats such as JSON.
Category: structured_generation
Image understanding
Analysing and interpreting the content of images.
Category: vision
Chart understanding
Reading and interpreting charts, tables and diagrams.
Category: vision
OCR
Recognising text within images and documents.
Category: vision
Multilingual
Understanding and generating text in many languages.
Category: language
Planning
Forming and executing action plans for complex tasks.
Category: planning
Streaming output
Category: reasoning

Benchmark results

13 benchmarks
Terminal-Bench 2.0
pass@1 · Agentic coding and system operations benchmark in a terminal environment
65.4%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Highest score among frontier models at launch, according to Anthropic. Increase from 59.3% (Opus 4.5).
Humanity's Last Exam (bez narzędzi)
accuracy · Multidisciplinary academic knowledge benchmark at frontier level; no access to external tools
40.0%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Increase from 30.8% (Opus 4.5). First place among frontier models at launch. Score updated on 23 February 2026 to 40.0% following improvements to fraud detection.
Humanity's Last Exam (z narzędziami)
accuracy · Multidisciplinary academic knowledge benchmark with access to external tools
53.0%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026; zaktualizowane 23 lutego 2026
Result originally reported as 53.1%, corrected to 53.0% on February 23, 2026, following the deployment of an improved fraud-detection pipeline (3 additional cases excluded).
SWE-bench Verified
pass@1 · Verified subset of 500 real GitHub issues
80.8%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Slight drop from 80.9% (Opus 4.5); Anthropic focused optimization efforts on other areas.
GDPval-AA (Praca merytoryczna)
Wynik Elo · Artificial Analysis benchmark measuring the economic value of substantive work in finance and law domains.
1606Elo
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Outperforms GPT-5.2 by approximately 144 Elo points and Opus 4.5 by 190 points.
BrowseComp
accuracy · Benchmark measuring the model's ability to locate hard-to-find information on the Internet; multi-agent configuration at max effort
84.0%
📅 5 Feb 2026📄 Vellum AI / Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Best result among frontier models at launch, according to Anthropic.
ARC AGI 2
accuracy · Abstract reasoning benchmark; max effort used with a thinking budget of 120k.
68.8%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Increase from 37.6% (Opus 4.5) — one of the largest jumps on this benchmark in the history of frontier model updates. Surpasses GPT-5.2 Pro (54.2%) and Gemini 3 Pro (45.1%).
GPQA Diamond
accuracy · Doctoral-level scientific knowledge benchmark
91.3%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Increase from 87.0% (Opus 4.5).
OSWorld
accuracy · Benchmark for agentic graphical interface control (computer use)
72.7%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Increase from approximately 66.3% (Opus 4.5).
MRCR v2 (8 igieł, 1M tokenów)
accuracy · Benchmark for retrieving multiple facts hidden within a very long text; 8-needle variant at 1M tokens
76.0%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Score increase from 18.5% (Sonnet 4.5). Opus 4.6 reaches 93% at 256K tokens.
Finance Agent
accuracy · Multi-step financial analysis benchmark
60.7%
📅 5 Feb 2026📄 Vellum AI / Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Leads among compared models at launch date.
BigLaw Bench (Harvey)
accuracy · Legal benchmark on BigLaw tasks; 40% perfect scores, 84% above 0.8.
90.2%
📅 5 Feb 2026📄 Harvey / oficjalne ogłoszenie Anthropic, 5 lutego 2026
Highest BigLaw Bench score among Claude models at launch.
MCP Atlas
accuracy · Multi-step, scaled tool use benchmark
75.8%
📅 5 Feb 2026📄 Anthropic — karta systemowa, 5 lutego 2026; zrewidowane przez Scale AI
Score updated by Scale AI following a change in evaluation methodology (originally 59.5%). Opus 4.7 achieved 77.3% on this benchmark.

Pricing

Deployment and security

🔒 Security / Enterprise
✓ Verified enterprise information

Claude Opus 4.6 demonstrates a safety profile at least as strong as Opus 4.5, with low rates of policy-violating behavior. On Amazon Bedrock, a zero operator access policy applies. Data residency options are available via the inference_geo parameter (1.1× multiplier for US-only inference). Enterprise security and compliance details: trust.anthropic.com.

Full safety evaluation is available in the Claude Opus 4.6 system card at anthropic.com/claude-opus-4-6-system-card. Opus 4.6 achieves the lowest over-refusal rate among recent Claude models. Enhanced planning capabilities may theoretically increase the potential for obfuscation in cases of misalignment — this issue is discussed in the system card.
Updated: 5 Feb 2026↗ Security documentation