On June 8, 2026, Anthropic published a detailed report describing the degree to which its own research and engineering work has been automated using Claude models. The numbers are concrete: more than 80% of code merged to production is written by Claude, and the typical engineer in Q2 2026 generates 8 times more code per day than in 2024. The company states plainly — recursive self-improvement (RSI), the ability of an AI system to autonomously design its own successor, could arrive sooner than most institutions are prepared for.
Key takeaways
- More than 80% of code merged to Anthropic's production codebase in May 2026 was authored by Claude
- The typical engineer generates 8x more code per day in Q2 2026 than in 2024 (under Claude supervision)
- Claude Opus 4.6 handles 12-hour tasks — in March 2024, Claude Opus 3 handled 4-minute tasks
- Anthropic announces plans to organize conversations with policymakers, researchers, and other AI companies on coordinating a potential slowdown of frontier AI work
- The document describes three future scenarios: capability stagnation, continued acceleration, and full RSI
The shift to the agentic era: internal data
The report — co-authored by Marina Favaro and Jack Clark of the Anthropic Institute — presents data that for years has been unavailable publicly. The key figure: more than 80% of code merged to production is Claude's work, not humans'. Before Claude Code launched in research preview in February 2025, this was "single-digit."
The engineer productivity gains are not a simple effect of automated code generation. The report identifies two inflection points. The first came in 2025, when Claude stopped merely suggesting code to copy-paste, and began autonomously running environments and testing results. The second in 2026, when models began working autonomously over longer time horizons. In Q2 2026, the typical engineer merges 8x more code per day than in 2024.
Equally significant is the task duration metric. Data from METR — an independent organization measuring AI capabilities — shows that the time horizon of tasks Claude can autonomously complete doubles roughly every four months. In March 2024, Claude Opus 3 handled 4-minute tasks. A year later, Claude Sonnet 3.7 handled 1.5-hour tasks. By May 2026, Claude Opus 4.6 completes 12-hour tasks. If the trend holds, multi-day tasks could come into range this year.
Toward research: from executor to director
The report is not limited to coding data. Anthropic presents data from experiments where Claude autonomously conducted research.
In a code optimization benchmark — tested at each model release — Claude is given code training a small AI model and tasked with speeding it up without breaking correctness. Claude Opus 4.0 (May 2025) achieved a 3x speedup. Claude Mythos Preview (April 2026) achieves 52x. For comparison: an experienced human researcher, working 4–8 hours, achieves 4x.
The report also describes a more open-ended research task. In April 2026, Anthropic published the first demonstration of Claude agents running an entire AI safety research project end-to-end (whether a weaker model can reliably supervise a stronger one). Two human researchers recovered 23% of the possible "gap" in about a week. Claude agents recovered 97% in 800 cumulative compute hours, at roughly $18,000 in compute. Humans still chose the problem and created the scoring rubric — but the agents designed every experiment themselves.
Three future scenarios
The report outlines three possible paths.
Scenario one: trends stagnate, but current AI capabilities diffuse widely. Capabilities may follow an S-curve rather than an exponential, with energy, chips, or architecture as the binding constraint. Even models frozen at today's level would mean enormous change — a 100-person company can increasingly do the work of a 1,000-person organization. Anthropic considers this the least likely scenario.
Scenario two: AI labs continue to see compounding efficiency gains, but humans retain the role of setting research directions. Individual productivity explodes. A 100-person company could match a 10,000- or 100,000-person organization. This is also a risk scenario: the same infrastructure can be used for mass surveillance or influence operations.
Scenario three: AI achieves full RSI and begins designing its own successors. In this world, the pace of AI progress would be determined entirely by compute availability. The human role would shrink to oversight and verification of a "virtual lab" run by AI systems. Anthropic admits it has "the worst intuitions" about what such a world would look like.
Coordination and the possibility of a slowdown
Anthropic poses the question directly: should the industry have the option of a coordinated slowdown or pause? The company is not calling for a unilateral stop — that, it writes, would only change who leads the race, not create the needed deliberative process. Instead, Anthropic announces plans to organize conversations with policymakers, researchers, civil society, and other AI companies about the conditions for a verifiable pause.
This is unprecedented in scale for the industry — a company at the top of the capability hierarchy publicly writing that if a coordinated verification mechanism existed, it would apply it, provided other frontier companies did the same.
Why this matters
Anthropic's report stands out from similar industry documents for one reason: it contains numbers from the company's own internal systems, not just external benchmarks. The data on Claude authoring 80% of production code or engineers achieving 8x productivity gains are accountable — the company committed to disclosing them.
This shifts the RSI discussion from academic to operational. Recursive self-improvement is no longer just a concept from safety cards — it is the direction Anthropic is moving in a planned, measurable way. The difference between the current state and full RSI is still the ability to choose problems and evaluate results, but the report shows Claude systematically entering those domains.
What's next
- Anthropic announced plans to organize conversations with policymakers, researchers, and other AI companies on the conditions for a coordinated pause — no timeline given.
- Internal goal: Claude Code is expected to achieve code quality "strictly better" than human-written within a year of the report's date (before June 2027).
- METR indicates Claude's task horizon may exceed several days later in 2026 — that would cross the next threshold in research work automation.
Sources
- Anthropic — When AI builds itself
- METR — Measuring AI ability to complete long-horizon tasks





