Robots Atlas>ROBOTS ATLAS
Claude Opus 4.5

Claude Opus 4.5

Opus 4.5 · Family: Claude
Claude Opus 4.5 is Anthropic's flagship AI model released November 24, 2025. The first model to exceed 80% on SWE-bench Verified (80.9%). Features the effort parameter, extended thinking, computer use, and industry-leading prompt injection resistance.
✓ Active✓ Public accessLLMMultimodalReasoning modelTool-using model📁 Claude
Context window
200K
tokens
Parameters
nieujawnione publicznie
parameters
Max output
64,000
tokens
Release date
24 November 2025
Access:APIHostedDeployment:☁ Cloud

Overview

Claude Opus 4.5 is the flagship language model from Anthropic, released on November 24, 2025. The API model identifier is claude-opus-4-5-20251101. The model belongs to the Claude 4 family and served as Anthropic's most capable model until the release of Claude Opus 4.6.

Key features

Opus 4.5 introduces an effort parameter controlling reasoning intensity, extended thinking, advanced computer use, and a multi-tool calling mode. The context window is 200,000 tokens, with a maximum output of 64,000 tokens. Knowledge cutoff date: May 2025.

Benchmark results

Opus 4.5 is the first model to exceed the 80% threshold on SWE-bench Verified (80.9%), ahead of GPT-5.1 (76.3%) and Gemini 3 Pro (76.2%). It scores 59.3% on Terminal-Bench 2.0, 37.6% on ARC-AGI-2, 66.3% on OSWorld (a threefold improvement over Claude 3.5), and 87.0% on GPQA Diamond. On the multi-step tool use benchmark MCP Atlas, it achieves 62.3% — significantly above the second-place result (Claude Sonnet 4.5, 43.8%).

Safety

The model is deployed under the AI Safety Level 3 (ASL-3) standard. According to Anthropic, it is their best-aligned model at the time of release. It offers industry-leading resistance to prompt injection — 4.7% attack success rate (Gray Swan), compared to 12.5% for Gemini 3 Pro and 21.9% for GPT-5.1 (lower is better). Training incorporates RLHF, RLAIF (Constitutional AI), and an inoculation strategy against reward hacking.

Availability and pricing

The model uses closed weights and is available via the Claude API (Anthropic), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing: $5 per million input tokens and $25 per million output tokens — a reduction of approximately 67% compared to Opus 4.1 ($15/$75).

Classification
LLMMultimodalReasoning modelTool-using model
Family: Claude
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
📏 Context: 200K
🧩 Parameters: nieujawnione publicznie
Tools
📥 Input: text, image, documents

Technical specification

Context window
200K
tokens
Parameters
nieujawnione publicznie
parameters
Max output tokens
64,000
tokens per response
Knowledge cutoff
1 May 2025
Knowledge boundary
License
proprietary
Hardware requirements
Access via Anthropic infrastructure, AWS Bedrock, or Google Vertex AI. Local deployment and open weights are not available.
Features:Tool use
Modalities
⬇ Input
textimagedocuments
⬆ Output
textcodestructured_datasummariesreports

Capabilities and applications

Native model capabilities
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Multi-step reasoning
Carrying out multi-step chains of reasoning across long, complex tasks.
Category: reasoning
Long context
Maintaining coherence and focus across very long input context.
Category: language
Coding
Generating, analysing and modifying source code.
Category: coding
Function Calling
Category: planning
Structured output
Producing data in structured formats such as JSON.
Category: structured_generation
Image understanding
Analysing and interpreting the content of images.
Category: vision
Chart understanding
Reading and interpreting charts, tables and diagrams.
Category: vision
OCR
Recognising text within images and documents.
Category: vision
Multilingual
Understanding and generating text in many languages.
Category: language
Planning
Forming and executing action plans for complex tasks.
Category: planning
Streaming output
Category: reasoning

Benchmark results

16 benchmarks
SWE-bench Verified
accuracy · No thinking budget; default effort (high); averaged over 5 independent runs; improved hosting environment (Terminus-2)
80.9%
📅 24 Nov 2025📄 Anthropic – oficjalny blog i system card (claude-opus-4-5)
First AI model to exceed the 80% threshold on SWE-bench Verified. Score higher than GPT-5.1 (76.3%) and Gemini 3 Pro (76.2%).
Terminal-Bench 2.0
accuracy · 128K thinking budget; averaged over 5 runs.
59.3%
📅 24 Nov 2025📄 Anthropic system card / Vellum AI analysis
Better than Gemini 3 Pro (54.2%) and GPT-5.1 (47.6%). Highest Terminal-Bench Hard score (44%) of all models tested by Artificial Analysis.
ARC-AGI-2
accuracy
37.6%
📅 24 Nov 2025📄 Anthropic system card / Vellum AI
More than twice the score of GPT-5.1 (17.6%); higher than Gemini 3 Pro (31.1%). Significant improvement in abstract non-verbal reasoning.
OSWorld
accuracy
66.3%
📅 24 Nov 2025📄 Anthropic system card / DataCamp
Three-fold improvement over Claude 3.5 (22%). Best Anthropic score for computer use at launch.
GPQA Diamond
accuracy · With extended thinking (64K token budget)
87.0%
📅 24 Nov 2025📄 Vellum AI / Artificial Analysis
Slightly below Gemini 3 Pro (91.9%) and GPT-5.1 (88.1%). A strong PhD-level result.
Humanity's Last Exam
accuracy · Z web search
43.2%
📅 24 Nov 2025📄 Vellum AI
Without web search: ~30.8%. Gemini 3 Pro achieves ~37.5% without tools, ~45.2% with tools.
MMMLU
accuracy
90.8%
📅 24 Nov 2025📄 Vellum AI / Anthropic system card
Slightly lower than Gemini 3 Pro (91.8%) and GPT-5.1 (91.0%). Higher than Claude Sonnet 4.5 (89.1%).
MMMU
accuracy · Z extended thinking
80.7%
📅 24 Nov 2025📄 Vellum AI / Anthropic system card
Lowest score in its class (GPT-5.1: 85.4%, Gemini 3 Pro: 81.0%).
MCP Atlas (scaled tool use)
accuracy
62.3%
📅 24 Nov 2025📄 Anthropic system card / DataCamp
Large margin: the second result belongs to Claude Sonnet 4.5 at 43.8%. A benchmark evaluating simultaneous use of multiple tools.
SpreadsheetBench
accuracy
64.25%
📅 24 Nov 2025📄 Zvi Mowshowitz / LessWrong (dane z system card)
Benchmark evaluating spreadsheet automation.
CyberGym
pass@1
50.6%
📅 24 Nov 2025📄 Zvi Mowshowitz / LessWrong / AIToolsReview (dane z system card)
1507 tasks based on real CVE vulnerabilities in open-source projects.
FinanceAgent
accuracy · External score; internal: 61.1%
55.2%
📅 24 Nov 2025📄 Zvi Mowshowitz / LessWrong (dane z system card)
Benchmark evaluating complex financial analysis.
Vending-Bench 2
final_balance
$4,967.06USD
📅 24 Nov 2025📄 Anthropic system card / Vellum AI
23% increase over Sonnet 4.5 ($3,849.74). Gemini 3 Pro leads at $5,478.16. Long-term strategic planning benchmark (one simulated business year).
Gray Swan Prompt Injection
attack_success_rate · Strong prompt injection attacks only; conducted by Gray Swan
4.7%
📅 24 Nov 2025📄 Gray Swan (third-party) / Anthropic system card
Best result in the industry. Gemini 3 Pro: 12.5%; GPT-5.1: 21.9%. Lower = better.
AIME 2025
accuracy · Z Python tools
100%
📅 24 Nov 2025📄 The Neuron / multiple sources
Score of 100% when using Python tools; without tools, the score is not officially disclosed by Anthropic.
LAB-Bench FigQA
accuracy · Baseline; with tools and reasoning: 69.2%
54.9%
📅 24 Nov 2025📄 Anthropic system card (via Zvi Mowshowitz)
Benchmark for understanding scientific diagrams.

Pricing

Technical architecture

Deployment and security

🔒 Security / Enterprise
✓ Verified enterprise information

Claude Opus 4.5 uses the publicly documented platform-level security controls provided by Anthropic. Security information applies primarily to Claude as a product, the Anthropic API, and enterprise features, rather than a separate security profile specific to the Opus 4.5 version.

In practice, the security of Opus 4.5 should be treated as inherited security from the Anthropic platform and enterprise controls.
Updated: 15 Mar 2026↗ Security documentation