Robots Atlas>ROBOTS ATLAS
Claude Opus 4.7

Claude Opus 4.7

4.7 · Family: Claude
The most advanced publicly available language model, optimized for agentic coding, long-term autonomous tasks, and image reasoning.
✓ Active✓ Public accessLLMMultimodalReasoning modelTool-using model📁 Claude
Context window
1M
tokens
Max output
128,000
tokens
Release date
16 April 2026
Access:APIHostedDeployment:☁ Cloud

Overview

Claude Opus 4.7 is Anthropic's most advanced publicly available language model, released on April 16, 2026. It is the direct successor to Claude Opus 4.6 (released February 2026) and belongs to the Claude 4 model family. The API model identifier is claude-opus-4-7. According to Anthropic, the model is less capable than the proprietary Claude Mythos Preview, but remains the strongest generally available model in the company's lineup.

Key improvements over Opus 4.6

Opus 4.7 delivers significant improvements across four areas. In agentic coding, the model achieved 87.6% on SWE-bench Verified (up from 80.8%) and 64.3% on SWE-bench Pro (up from 53.4%), setting a new benchmark among publicly available models. In vision, this is the first Claude model to support high-resolution images — up to 2576 pixels on the long side (~3.75 MP), compared to the previous limit of 1568 px / 1.15 MP. Image coordinates now map 1:1 to actual pixels, simplifying work in computer use mode.

A new effort level, xhigh (extra high) — positioned between high and max — has been introduced, allowing finer control over the trade-off between reasoning depth and response latency. Task budgets are available in public beta: a mechanism that lets developers set an approximate token limit for an entire agentic cycle, with a visible countdown counter accessible to the model.

The model substantially improves literal instruction-following — it interprets instructions literally rather than loosely, which requires adjusting prompts optimized for Opus 4.6. Opus 4.7 verifies its own outputs before responding, makes better use of filesystem-based memory, and more reliably executes long-running multi-session tasks. Adaptive thinking is supported and is the only thinking mode available — fixed extended thinking budgets have been removed.

API changes and cybersecurity safeguards

Opus 4.7 introduces breaking changes to the Messages API: (1) setting a fixed thinking budget via budget_tokens returns a 400 error; (2) setting a custom value for temperature, top_p, or top_k returns a 400 error; (3) thinking content is omitted from responses by default. The new tokenizer may produce up to 35% more tokens for the same input text compared to Opus 4.6. Opus 4.7 is the first Claude model on which Anthropic is testing automated cybersecurity safeguards ahead of a potential broader release of Mythos-class models. Security professionals conducting legitimate tasks (penetration testing, red-teaming, vulnerability research) may apply for access through the Cyber Verification Program.

Safety and model alignment

Anthropic's alignment evaluation found that Opus 4.7 is "largely well-aligned and trustworthy, though not fully ideal in its behavior." The model exhibits a safety profile similar to Opus 4.6, with improvements in honesty and resistance to prompt injection attacks. A minor regression was noted in one area: the model may provide overly detailed harm-reduction advice regarding controlled substances. The full safety evaluation is published in the Claude Opus 4.7 system card (232 pages, published April 16, 2026).

Classification
LLMMultimodalReasoning modelTool-using model
Family: Claude
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
📏 Context: 1M
Tools
📥 Input: text, image, documents

Technical specification

Context window
1M
tokens
Max output tokens
128,000
tokens per response
Knowledge cutoff
31 Jan 2026
Knowledge boundary
License
Komercyjna (zastrzeżona, model zamknięty)
Hardware requirements
Closed model, available exclusively via API. Local deployment is not supported. Available through: Claude API (Anthropic), Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry.
Features:Tool use
Modalities
⬇ Input
textimagedocuments
⬆ Output
textcodestructured_data

Capabilities and applications

Native model capabilities
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Multi-step reasoning
Carrying out multi-step chains of reasoning across long, complex tasks.
Category: reasoning
Long context
Maintaining coherence and focus across very long input context.
Category: language
Coding
Generating, analysing and modifying source code.
Category: coding
Function Calling
Category: planning
Structured output
Producing data in structured formats such as JSON.
Category: structured_generation
Image understanding
Analysing and interpreting the content of images.
Category: vision
Chart understanding
Reading and interpreting charts, tables and diagrams.
Category: vision
OCR
Recognising text within images and documents.
Category: vision
Multilingual
Understanding and generating text in many languages.
Category: language
Planning
Forming and executing action plans for complex tasks.
Category: planning
Streaming output
Category: reasoning

Benchmark results

19 benchmarks
SWE-bench Verified
pass@1 · A verified human-curated subset of 500 real GitHub issues resolved end-to-end; memorization screens applied.
87.6%
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Increase from 80.8% (Opus 4.6). The improvement margin holds after excluding items identified as potentially memorized.
SWE-bench Pro
pass@1 · Multilingual software engineering benchmark; harder and less contaminated than SWE-bench Verified
64.3%
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Increase from 53.4% (Opus 4.6). Outperforms GPT-5.4 (57.7%) and Gemini 3.1 Pro (54.2%) among publicly available models.
SWE-bench Multilingual
pass@1
80.5%
📅 16 Apr 2026📄 Anthropic — karta systemowa, 16 kwietnia 2026
Increase from 77.8% (Opus 4.6). An internal implementation was applied to both models.
Terminal-Bench 2.0
pass@1 · Terminus-2 environment, thinking disabled; resource allocation 1× guaranteed / 3× ceiling, averaged over 5 runs per task
69.4%
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Increase from 65.4% (Opus 4.6). GPT-5.4 self-reports 75.1% based on its own testing environment — the result is not directly comparable.
GPQA Diamond
accuracy · PhD-level scientific knowledge benchmark
94.2%
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Increase from 91.3% (Opus 4.6). Comparable to GPT-5.4 Pro (94.4%) and Gemini 3.1 Pro (94.3%); benchmark near saturation at the frontier level.
Humanity's Last Exam (bez narzędzi)
accuracy · Multidisciplinary academic benchmark without tool access
46.9%
📅 16 Apr 2026📄 Anthropic — karta systemowa, 16 kwietnia 2026
Humanity's Last Exam (z narzędziami)
accuracy · Multidisciplinary academic benchmark with tool access
54.7%
📅 16 Apr 2026📄 Anthropic — karta systemowa, 16 kwietnia 2026
Outperforms Gemini 3.1 Pro (51.4%).
MCP-Atlas
accuracy · Multi-stage, multi-step benchmark of scaled tool use from Scale AI
77.3%
📅 16 Apr 2026📄 Anthropic / Scale AI — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Score increased from 75.8% (Opus 4.6; result updated following a Scale AI evaluation methodology change). Highest score among publicly available models — ahead of GPT-5.4 (68.1%) and Gemini 3.1 Pro (73.9%).
OSWorld-Verified
accuracy · Benchmark for agentic GUI control (computer use)
78.0%
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Increase from 72.7% (Opus 4.6). Surpasses GPT-5.4 (75.0%); 1.6 pts below Mythos Preview (79.6%).
Finance Agent v1.1
accuracy · Multi-step financial analysis benchmark covering financial modeling and presentation creation
64.4%
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Increase from 60.7% (Opus 4.6). Best result among compared models on release day.
CharXiv Reasoning (bez narzędzi)
accuracy · Visual reasoning on arXiv charts and illustrations without tool use
82.1%
📅 16 Apr 2026📄 Anthropic — karta systemowa, 16 kwietnia 2026
Increase from approx. 68.7–69.1% (Opus 4.6). Largest jump in the visual reasoning category.
CharXiv Reasoning (z narzędziami)
accuracy · Visual reasoning on arXiv charts and illustrations using tools
91.0%
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Increase from 84.7% (Opus 4.6).
BrowseComp
accuracy · Agentic web search benchmark
79.3%
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Drop from approx. 84.0% (Opus 4.6, measured in a multi-agent configuration at maximum effort). GPT-5.4 Pro achieves 89.3%; Gemini 3.1 Pro — 85.9%. Regression area.
CyberGym
accuracy · Cybersecurity vulnerability reproduction benchmark
73.1%
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Nearly identical to Opus 4.6 (updated score 73.8%) — a deliberate training decision involving differential restriction of cyber capabilities. Mythos Preview reaches 83.1%.
MMMLU (wielojęzyczne Q&A)
accuracy · Multilingual Massive Multitask Language Understanding
91.5%
📅 16 Apr 2026📄 Anthropic — karta systemowa, 16 kwietnia 2026
Gemini 3.1 Pro scores approx. 92.6% — a marginal lead over the competitor.
GDPVal-AA (praca merytoryczna)
Wynik Elo · Elo-based benchmark measuring the economic value of substantive work in finance and law domains.
1753Elo
📅 16 Apr 2026📄 Anthropic — oficjalne ogłoszenie i karta systemowa, 16 kwietnia 2026
Outperforms GPT-5.4 (1 674) and Gemini 3.1 Pro (1 314).
OfficeQA Pro (Databricks)
accuracy · Enterprise data question-and-answer benchmark.
80.6%
📅 16 Apr 2026📄 Anthropic — karta systemowa / recenzja Decrypt, 16 kwietnia 2026
Increase from 57.1% (Opus 4.6). Outperforms GPT-5.4 (51.1%) and Gemini 3.1 Pro (42.9%).
CursorBench
pass@1 · Autonomous coding benchmark in the Cursor editor
70%
📅 16 Apr 2026📄 Cursor / oficjalne ogłoszenie Anthropic, 16 kwietnia 2026
Increase from 58% (Opus 4.6). Best result among evaluated models on release day.
BigLaw Bench (Harvey)
accuracy · Legal benchmark on BigLaw tasks; high-effort mode.
90.9%
📅 16 Apr 2026📄 Harvey / oficjalne ogłoszenie Anthropic, 16 kwietnia 2026
Submitted by Harvey as part of the early access partner evaluation.

Pricing

Deployment and security

🔒 Security / Enterprise
✓ Verified enterprise information

Claude Opus 4.7 is the first Claude model with automated real-time cybersecurity safeguards that detect and block requests indicating prohibited or high-risk cyber use cases (Project Glasswing initiative). On Amazon Bedrock, a zero operator-access policy applies — customer prompts and responses are not visible to Anthropic or AWS employees. Data residency options are available via the inference_geo parameter in the Claude API (1.1× multiplier for US-only inference). Regional and multi-regional endpoints on Google Vertex AI and Amazon Bedrock are available at a 10% premium over global endpoints. Enterprise security and compliance information: trust.anthropic.com.

Security professionals conducting legitimate work (penetration testing, red-teaming, vulnerability research) may apply for access through Anthropic's Cyber Verification Program. A full safety evaluation is published in the Claude Opus 4.7 system card (232 pages, April 16, 2026). The compliance assessment concluded that the model is "largely well-aligned and trustworthy, though not fully ideal."
Updated: 16 Apr 2026↗ Security documentation