Claude Opus 4.8: four times less likely to miss bugs, dynamic workflows added

Anthropic announced on May 28, 2026 the release of Claude Opus 4.8 — an upgrade to the Opus class available immediately to all users at the same price as Opus 4.7. The model brings measurable improvements in reliability, agentic tasks, and coding, while debuting with a dynamic workflows feature and effort-level controls.

Key takeaways

Opus 4.8 is four times less likely than Opus 4.7 to leave flaws in its written code unremarked
On CursorBench it exceeds all prior Opus models at every effort level
On Legal Agent Benchmark it achieves the highest score on record — the first model to break 10% on the all-pass standard
On Online-Mind2Web (browser/computer-use) it reaches 84% — a meaningful jump over Opus 4.7 and GPT-5.5
Fast mode pricing drops threefold: $10/$50 per million input/output tokens
Dynamic workflows in Claude Code enable hundreds of parallel subagents in a single session

Reliability as a core capability

One of the most significant aspects of Opus 4.8 is what Anthropic calls honesty — the tendency to flag uncertainties rather than confidently report progress on incomplete work. In agentic environments, this matters: a model that presents unverified results as confirmed can trigger cascading errors across multi-step workflows.

According to the System Card, Opus 4.8 is four times less likely than Opus 4.7 to allow flaws in code it has written to pass unremarked. Anthropic's alignment team assessed that the model "reaches new highs on measures of prosocial traits like supporting user autonomy and acting in the user's best interest." Rates of misaligned behavior — such as deception or cooperation with misuse — are substantially lower than in Opus 4.7 and comparable to Claude Mythos Preview.

Benchmark results

Opus 4.8 scores clearly above its predecessor on several key evaluations. On CursorBench — developed by Cursor to assess models on engineering tasks — it exceeds all prior Opus versions at every effort level, while achieving tool efficiency: fewer steps for equivalent intelligence. On Online-Mind2Web, which measures browser-agent capability, Opus 4.8 reaches 84% — a meaningful jump over both Opus 4.7 and GPT-5.5.

Anthropic also published a comparison table for Opus 4.8 against Opus 4.7, GPT-5.5, and Gemini 3.1 Pro across seven benchmarks covering coding, reasoning, and knowledge work. Opus 4.8 wins six of seven tests; the only loss is terminal coding, where GPT-5.5 retains the lead.

Benchmark	Opus 4.8	Opus 4.7	GPT-5.5	Gemini 3.1 Pro
Agentic coding (SWE-Bench Pro)	69.2%	64.3%	58.6%	54.2%
Agentic terminal coding (Terminal-Bench 2.1)	74.6%	66.1%	78.2%	70.3%
Multidisciplinary reasoning (Humanity’s Last Exam) — no tools	49.8%	46.9%	41.4%	44.4%
Multidisciplinary reasoning (Humanity’s Last Exam) — with tools	57.9%	54.7%	52.2%	51.4%
Agentic computer use (OSWorld-Verified)	83.4%	82.8%	78.7%	76.2%
Knowledge work (GDPval-AA, score)	1890	1753	1769	1314
Agentic financial analysis (Finance Agent v2)	53.9%	51.5%	51.8%	43.0%

On the Legal Agent Benchmark, the model is the first to break 10% on the all-pass standard, setting a new record. For professional legal workflows, this accuracy improvement translates directly into how much real attorney work can be delegated to AI with confidence.

In financial analytics tasks, testers from the investment sector reported consistently higher-quality analysis, faster task completion, and better signal-to-noise ratio — particularly the model's tendency to proactively flag problems with the inputs and outputs of an analysis that other models routinely missed. In Genie, Databricks' AI agent for data and knowledge work, the model also demonstrated a step change in agentic reasoning, tackling deeper, multistep questions faster, at 61% lower token cost than Opus 4.7.

Dynamic workflows and new features

The key infrastructure addition is dynamic workflows — available in research preview for Enterprise, Team, and Max plans. The feature lets Claude Code plan work and then run hundreds of parallel subagents within a single session, with the agents running for longer than before, and the model verifying outputs before reporting back to the user.

Anthropic's practical example: Claude Code with Opus 4.8 can now carry out a full codebase migration — spanning hundreds of thousands of lines — from kickoff to merge, using the existing test suite as the quality bar. This is a level of automation that previously required manual coordination across multiple engineers.

Alongside this, Anthropic is adding effort control in claude.ai and Cowork: users can choose how much compute effort the model invests — lower effort for faster responses and slower rate-limit consumption, higher effort for better results at greater token cost. The Messages API now also accepts system entries inside the messages array, allowing developers to update Claude's instructions mid-task without breaking the prompt cache.

Pricing and availability

Standard Opus 4.8 pricing is unchanged: $5 per million input tokens and $25 per million output tokens. Fast mode becomes three times cheaper than in previous models: $10/$50 per million tokens, at 2.5× the normal speed. For developers building high-throughput agentic systems, this is a meaningful cost reduction.

Mode	Input (/1M tok.)	Output (/1M tok.)	Speed
Opus 4.8 — standard	$5	$25	baseline
Opus 4.8 — fast mode	$10	$50	2.5× faster
Opus 4.7 — fast mode (for comparison)	$30	$150	baseline

The model is available via the `claude-opus-4-8` identifier through the Anthropic API and across all claude.ai interfaces.

Why this matters

Opus 4.8 is not merely a benchmark update — it signals a shift in how Anthropic prioritizes model development. Earlier Opus versions were evaluated primarily on intelligence and raw performance. Opus 4.8 shifts emphasis toward reliability: a model that doubts and signals uncertainty is more valuable in production environments than one that frequently errs. In agentic systems, where a single unremarked error can propagate across dozens of steps, this trait has direct consequences for system-level reliability.

The threefold reduction in fast mode pricing and the introduction of dynamic workflows lower the barrier for applications requiring high throughput and long agentic sessions, which could accelerate Opus adoption in segments where cost was previously a limiting factor.

One important forward-looking note: Anthropic has stated it is working on a new class of models with intelligence beyond Opus — Project Glasswing and the Mythos model, already being tested by a limited set of organizations for cybersecurity work. Opus 4.8 is the current step, not the final one.

What's next

Anthropic has announced Mythos-class models for general availability within the coming weeks — after developing appropriate safeguards for models at that capability level (Project Glasswing)
Dynamic workflows enters research preview — Anthropic will gather feedback before full rollout
Work is underway on models with comparable capabilities to Opus at lower cost — no timeline given