Anthropic's flagship language model from the Claude 4 family, released February 5, 2026, featuring a 1M token context window, adaptive reasoning, and advanced agentic capabilities.
Context window
1M
tokens
Max output
128,000
tokens
Release date
5 February 2026
Access:APIHostedDeployment:☁ Cloud
Overview
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
📏 Context: 1M
✓ Tools
📥 Input: text, image, documents
Technical specification
Context window
1M
tokens
Max output tokens
128,000
tokens per response
Knowledge cutoff
31 May 2025
Knowledge boundary
License
Komercyjna (zastrzeżona, model zamknięty)
Hardware requirements
Closed model, available exclusively via API. Local deployment is not supported. Available through: Claude API (Anthropic), Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry.
Features:✓ Tool use
Modalities
⬇ Input
textimagedocuments
⬆ Output
textcodestructured_data
Capabilities and applications
Native model capabilities
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Multi-step reasoning
Carrying out multi-step chains of reasoning across long, complex tasks.
Category: reasoning
Long context
Maintaining coherence and focus across very long input context.
Category: language
Coding
Generating, analysing and modifying source code.
Category: coding
Function Calling
Category: planning
Structured output
Producing data in structured formats such as JSON.
Category: structured_generation
Image understanding
Analysing and interpreting the content of images.
Category: vision
Chart understanding
Reading and interpreting charts, tables and diagrams.
Category: vision
OCR
Recognising text within images and documents.
Category: vision
Multilingual
Understanding and generating text in many languages.
Category: language
Planning
Forming and executing action plans for complex tasks.
Category: planning
Streaming output
Category: reasoning
Benchmark results
13 benchmarks
Terminal-Bench 2.0
pass@1 · Agentic coding and system operations benchmark in a terminal environment
65.4%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Highest score among frontier models at launch, according to Anthropic. Increase from 59.3% (Opus 4.5).
Humanity's Last Exam (bez narzędzi)
accuracy · Multidisciplinary academic knowledge benchmark at frontier level; no access to external tools
40.0%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Increase from 30.8% (Opus 4.5). First place among frontier models at launch. Score updated on 23 February 2026 to 40.0% following improvements to fraud detection.
Humanity's Last Exam (z narzędziami)
accuracy · Multidisciplinary academic knowledge benchmark with access to external tools
53.0%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026; zaktualizowane 23 lutego 2026
Result originally reported as 53.1%, corrected to 53.0% on February 23, 2026, following the deployment of an improved fraud-detection pipeline (3 additional cases excluded).
SWE-bench Verified
pass@1 · Verified subset of 500 real GitHub issues
80.8%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Slight drop from 80.9% (Opus 4.5); Anthropic focused optimization efforts on other areas.
GDPval-AA (Praca merytoryczna)
Wynik Elo · Artificial Analysis benchmark measuring the economic value of substantive work in finance and law domains.
1606Elo
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Outperforms GPT-5.2 by approximately 144 Elo points and Opus 4.5 by 190 points.
BrowseComp
accuracy · Benchmark measuring the model's ability to locate hard-to-find information on the Internet; multi-agent configuration at max effort
84.0%
📅 5 Feb 2026📄 Vellum AI / Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Best result among frontier models at launch, according to Anthropic.
ARC AGI 2
accuracy · Abstract reasoning benchmark; max effort used with a thinking budget of 120k.
68.8%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Increase from 37.6% (Opus 4.5) — one of the largest jumps on this benchmark in the history of frontier model updates. Surpasses GPT-5.2 Pro (54.2%) and Gemini 3 Pro (45.1%).
GPQA Diamond
accuracy · Doctoral-level scientific knowledge benchmark
91.3%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Increase from 87.0% (Opus 4.5).
OSWorld
accuracy · Benchmark for agentic graphical interface control (computer use)
72.7%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 5 lutego 2026
Increase from approximately 66.3% (Opus 4.5).
MRCR v2 (8 igieł, 1M tokenów)
accuracy · Benchmark for retrieving multiple facts hidden within a very long text; 8-needle variant at 1M tokens
76.0%
📅 5 Feb 2026📄 Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Score increase from 18.5% (Sonnet 4.5). Opus 4.6 reaches 93% at 256K tokens.
Finance Agent
accuracy · Multi-step financial analysis benchmark
60.7%
📅 5 Feb 2026📄 Vellum AI / Anthropic — oficjalne ogłoszenie, 5 lutego 2026
Leads among compared models at launch date.
BigLaw Bench (Harvey)
accuracy · Legal benchmark on BigLaw tasks; 40% perfect scores, 84% above 0.8.
90.2%
📅 5 Feb 2026📄 Harvey / oficjalne ogłoszenie Anthropic, 5 lutego 2026
Highest BigLaw Bench score among Claude models at launch.
MCP Atlas
accuracy · Multi-step, scaled tool use benchmark
75.8%
📅 5 Feb 2026📄 Anthropic — karta systemowa, 5 lutego 2026; zrewidowane przez Scale AI
Score updated by Scale AI following a change in evaluation methodology (originally 59.5%). Opus 4.7 achieved 77.3% on this benchmark.
Pricing
Deployment and security
🔒 Security / Enterprise
✓ Verified enterprise information
Claude Opus 4.6 demonstrates a safety profile at least as strong as Opus 4.5, with low rates of policy-violating behavior. On Amazon Bedrock, a zero operator access policy applies. Data residency options are available via the inference_geo parameter (1.1× multiplier for US-only inference). Enterprise security and compliance details: trust.anthropic.com.
Full safety evaluation is available in the Claude Opus 4.6 system card at anthropic.com/claude-opus-4-6-system-card. Opus 4.6 achieves the lowest over-refusal rate among recent Claude models. Enhanced planning capabilities may theoretically increase the potential for obfuscation in cases of misalignment — this issue is discussed in the system card.
Updated: 5 Feb 2026↗ Security documentation
Sources and related pages
7 sources
BlogIntroducing Claude Opus 4.6 — AnthropicDocsModels overview — Claude API DocsDocsPricing — Claude API DocsDocsClaude Platform release notes — Claude API DocsDocsContext windows — Claude API DocsDocsHow up-to-date is Claude's training data — Claude Help CenterWebClaude Opus 4.6 — strona produktu Anthropic
Browse related topics
