Claude Opus 4.5

Opus 4.5 · Family: Claude

Claude Opus 4.5 is Anthropic's flagship AI model released November 24, 2025. The first model to exceed 80% on SWE-bench Verified (80.9%). Features the effort parameter, extended thinking, computer use, and industry-leading prompt injection resistance.

✓ Active✓ Public accessLLMMultimodalReasoning modelTool-using model📁 Claude

Context window

200K

tokens

Parameters

nieujawnione publicznie

parameters

Max output

64,000

tokens

Release date

24 November 2025

🏢AnthropicProducer

Access:APIHostedDeployment:☁ Cloud

Overview

Claude Opus 4.5 is the flagship language model from Anthropic, released on November 24, 2025. The API model identifier is claude-opus-4-5-20251101. The model belongs to the Claude 4 family and served as Anthropic's most capable model until the release of Claude Opus 4.6.

Key features

Opus 4.5 introduces an effort parameter controlling reasoning intensity, extended thinking, advanced computer use, and a multi-tool calling mode. The context window is 200,000 tokens, with a maximum output of 64,000 tokens. Knowledge cutoff date: May 2025.

Benchmark results

Opus 4.5 is the first model to exceed the 80% threshold on SWE-bench Verified (80.9%), ahead of GPT-5.1 (76.3%) and Gemini 3 Pro (76.2%). It scores 59.3% on Terminal-Bench 2.0, 37.6% on ARC-AGI-2, 66.3% on OSWorld (a threefold improvement over Claude 3.5), and 87.0% on GPQA Diamond. On the multi-step tool use benchmark MCP Atlas, it achieves 62.3% — significantly above the second-place result (Claude Sonnet 4.5, 43.8%).

Safety

The model is deployed under the AI Safety Level 3 (ASL-3) standard. According to Anthropic, it is their best-aligned model at the time of release. It offers industry-leading resistance to prompt injection — 4.7% attack success rate (Gray Swan), compared to 12.5% for Gemini 3 Pro and 21.9% for GPT-5.1 (lower is better). Training incorporates RLHF, RLAIF (Constitutional AI), and an inoculation strategy against reward hacking.

Availability and pricing

The model uses closed weights and is available via the Claude API (Anthropic), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing: $5 per million input tokens and $25 per million output tokens — a reduction of approximately 67% compared to Opus 4.1 ($15/$75).

Classification

LLMMultimodalReasoning modelTool-using model

Family: Claude

Applications

Chatbots Document analysis Data analysis Summarization Translation

Access & deployment

APIHosted

Cloud

Weights: Closed

Key parameters

📏 Context: 200K

🧩 Parameters: nieujawnione publicznie

✓ Tools

📥 Input: text, image, documents

Platforms

Anthropic Claude API Vertex AI Amazon Bedrock Microsoft Azure AI Foundry

Technical specification

Context window

200K

tokens

Parameters

nieujawnione publicznie

parameters

Max output tokens

64,000

tokens per response

Knowledge cutoff

1 May 2025

Knowledge boundary

License

proprietary

Hardware requirements

Access via Anthropic infrastructure, AWS Bedrock, or Google Vertex AI. Local deployment and open weights are not available.

Features:✓ Tool use

Modalities

⬇ Input

textimagedocuments

⬆ Output

textcodestructured_datasummariesreports

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Long context

Maintaining coherence and focus across very long input context.

Category: language

Coding

Generating, analysing and modifying source code.

Category: coding

Function Calling

Category: planning

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Image understanding

Analysing and interpreting the content of images.

Category: vision

Chart understanding

Reading and interpreting charts, tables and diagrams.

Category: vision

OCR

Recognising text within images and documents.

Category: vision

Multilingual

Understanding and generating text in many languages.

Category: language

Planning

Forming and executing action plans for complex tasks.

Category: planning

Streaming output

Category: reasoning

Application domains

Chatbots Document analysis Data analysis Summarization Translation

Benchmark results

16 benchmarks

SWE-bench Verified

accuracy · No thinking budget; default effort (high); averaged over 5 independent runs; improved hosting environment (Terminus-2)

80.9%

📅 24 Nov 2025📄 Anthropic – oficjalny blog i system card (claude-opus-4-5)

First AI model to exceed the 80% threshold on SWE-bench Verified. Score higher than GPT-5.1 (76.3%) and Gemini 3 Pro (76.2%).

Terminal-Bench 2.0

accuracy · 128K thinking budget; averaged over 5 runs.

59.3%

📅 24 Nov 2025📄 Anthropic system card / Vellum AI analysis

Better than Gemini 3 Pro (54.2%) and GPT-5.1 (47.6%). Highest Terminal-Bench Hard score (44%) of all models tested by Artificial Analysis.

ARC-AGI-2

accuracy

37.6%

📅 24 Nov 2025📄 Anthropic system card / Vellum AI

More than twice the score of GPT-5.1 (17.6%); higher than Gemini 3 Pro (31.1%). Significant improvement in abstract non-verbal reasoning.

OSWorld

accuracy

66.3%

📅 24 Nov 2025📄 Anthropic system card / DataCamp

Three-fold improvement over Claude 3.5 (22%). Best Anthropic score for computer use at launch.

GPQA Diamond

accuracy · With extended thinking (64K token budget)

87.0%

📅 24 Nov 2025📄 Vellum AI / Artificial Analysis

Slightly below Gemini 3 Pro (91.9%) and GPT-5.1 (88.1%). A strong PhD-level result.

Humanity's Last Exam

accuracy · Z web search

43.2%

📅 24 Nov 2025📄 Vellum AI

Without web search: ~30.8%. Gemini 3 Pro achieves ~37.5% without tools, ~45.2% with tools.

MMMLU

accuracy

90.8%

📅 24 Nov 2025📄 Vellum AI / Anthropic system card

Slightly lower than Gemini 3 Pro (91.8%) and GPT-5.1 (91.0%). Higher than Claude Sonnet 4.5 (89.1%).

MMMU

accuracy · Z extended thinking

80.7%

📅 24 Nov 2025📄 Vellum AI / Anthropic system card

Lowest score in its class (GPT-5.1: 85.4%, Gemini 3 Pro: 81.0%).

MCP Atlas (scaled tool use)

accuracy

62.3%

📅 24 Nov 2025📄 Anthropic system card / DataCamp

Large margin: the second result belongs to Claude Sonnet 4.5 at 43.8%. A benchmark evaluating simultaneous use of multiple tools.

SpreadsheetBench

accuracy

64.25%

📅 24 Nov 2025📄 Zvi Mowshowitz / LessWrong (dane z system card)

Benchmark evaluating spreadsheet automation.

CyberGym

pass@1

50.6%

📅 24 Nov 2025📄 Zvi Mowshowitz / LessWrong / AIToolsReview (dane z system card)

1507 tasks based on real CVE vulnerabilities in open-source projects.

FinanceAgent

accuracy · External score; internal: 61.1%

55.2%

📅 24 Nov 2025📄 Zvi Mowshowitz / LessWrong (dane z system card)

Benchmark evaluating complex financial analysis.

Vending-Bench 2

final_balance

$4,967.06USD

📅 24 Nov 2025📄 Anthropic system card / Vellum AI

23% increase over Sonnet 4.5 ($3,849.74). Gemini 3 Pro leads at $5,478.16. Long-term strategic planning benchmark (one simulated business year).

Gray Swan Prompt Injection

attack_success_rate · Strong prompt injection attacks only; conducted by Gray Swan

4.7%

📅 24 Nov 2025📄 Gray Swan (third-party) / Anthropic system card

Best result in the industry. Gemini 3 Pro: 12.5%; GPT-5.1: 21.9%. Lower = better.

AIME 2025

accuracy · Z Python tools

100%

📅 24 Nov 2025📄 The Neuron / multiple sources

Score of 100% when using Python tools; without tools, the score is not officially disclosed by Anthropic.

LAB-Bench FigQA

accuracy · Baseline; with tools and reasoning: 69.2%

54.9%

📅 24 Nov 2025📄 Anthropic system card (via Zvi Mowshowitz)

Benchmark for understanding scientific diagrams.

Pricing

Technical architecture

Core Architecture

TRTransformer

Model Form

RMReasoning model MLMultimodal LLM

Training Techniques

ITInstruction Tuning COCoT RLRLHF

Deployment and security

☁ Available on platforms

☁Anthropic Claude APIPlatform ☁Vertex AIPlatform ☁Amazon BedrockPlatform ☁Microsoft Azure AI FoundryPlatform

🔒 Security / Enterprise

✓ Verified enterprise information

Claude Opus 4.5 uses the publicly documented platform-level security controls provided by Anthropic. Security information applies primarily to Claude as a product, the Anthropic API, and enterprise features, rather than a separate security profile specific to the Opus 4.5 version.

In practice, the security of Opus 4.5 should be treated as inherited security from the Anthropic platform and enterprise controls.

Updated: 15 Mar 2026↗ Security documentation

Sources and related pages

11 sources

BlogIntroducing Claude Opus 4.5anthropic.com DocsModels overview - Claude API Docsdocs.anthropic.com DocsPricing - Claude API Docsdocs.anthropic.com DocsComputer use tool - Claude API Docsdocs.anthropic.com WebAnthropic Transparency Hubanthropic.com ReportClaude Opus 4.5 System Card – Anthropicanthropic.com DocsWhat's new in Claude 4.5 – Claude API Docsplatform.claude.com WebClaude Opus 4.5 on Vertex AI – Google Cloud Blogcloud.google.com WebIntroducing Claude Opus 4.5 in Microsoft Foundry – Azure Blogazure.microsoft.com WebClaude Opus 4.5 now in Amazon Bedrock – AWS Blogaws.amazon.com WebClaude Opus 4.5 – Azure AI Foundry Model Catalogai.azure.com

Browse related topics

📁 Claude 🌐 Chatbots 🌐 Document analysis 🌐 Data analysis 🌐 Summarization 🧠 Transformer 🧠 Reasoning model 🧠 Multimodal LLM ☁ Anthropic Claude API ☁ Vertex AI All llm models All multimodal model models