Recursive Self-Improvement in AI — How Systems Are Learning to Build Better Versions of Themselves

What Is Recursive Self-Improvement

The concept of RSI dates to 1966, when mathematician I. J. Good described an "ultraintelligent machine" capable of designing even better machines — triggering an intelligence explosion that would leave humanity far behind. For decades RSI remained a thought experiment more than an engineering challenge.

Today, large language models (LLMs) — including GPT, Gemini, Claude, and Grok — actively write code for their own next versions. In February 2026, OpenAI reported that GPT‑5.3‑Codex was instrumental in creating itself, helping to debug training, manage deployment, and analyze evaluation results. Anthropic claims that the majority of its code is now written by Claude Code. Both systems still rely on humans to direct and verify the work.

RSI is a spectrum, not a binary state. At its strictest, researchers use the term to describe systems that can improve not just their outputs but the process by which they improve — generating ideas, evaluating results, and modifying their own methods with zero human direction. By that standard, no current system qualifies fully.

How RSI Works — Closing the Loop

Researchers have spent decades building RSI foundations. Machine learning algorithms automatically tune program parameters. Evolutionary algorithms diversify and iterate on design solutions. AutoML automated aspects of the pipeline in which neural networks are structured, trained, and evaluated.

Today, several levels of loop closure can be distinguished:

Output level — the model helps write code for its next version (GPT-5.3-Codex, Claude Code). Humans oversee every step.
Algorithm discovery level — the system automatically explores the solution space (AlphaEvolve). Humans define goals and metrics.
Agent architecture level — an agentic AI can modify its own code and learning mechanisms (Darwin Gödel Machines). The loop is closer to closed.
Full research cycle level — from hypothesis through experiment to peer review (AI Scientist). Knowledge integration across agents.

Key Projects and Components

AlphaEvolve (Google DeepMind) is a coding agent for scientific and algorithmic discovery. It uses LLMs to guide the evolution of solutions — from optimizing neural-network architectures, to data-center scheduling, to chip design. Matej Balog from Google DeepMind notes the system has repeatedly surprised its own team with discoveries that evaded human intuition.

Darwin Gödel Machines (DGM) — from researchers at University of British Columbia and Sakana AI. Agents use evolutionary algorithms to improve LLM-based coding agents. Critically, agents can alter their own code (though not the underlying LLM), and a newer version can even alter its meta-mechanisms for improving itself.

AI Scientist is a project from the same team at the University of British Columbia and Sakana AI, published in Nature in March 2026. It goes further than DGM: rather than just improving an agent's code, it automates the entire scientific cycle — generating a hypothesis, running experiments in software, writing up the paper, and then autonomously peer-reviewing it. This matters because in classical RSI the bottleneck is evaluation: someone must judge whether a result is good. AI Scientist attempts to close that loop without human involvement.

Ricursive Intelligence is a startup founded by the co-leads of AlphaChip — Google DeepMind's earlier chip-design system. Its mission is to use AI to design better chips on which even better AI can be trained. This is one of the clearest hardware-level RSI examples: a stronger chip enables faster training, which produces a better model, which designs a better chip. Co-founder Azalia Mirhoseini expects to compress the design cycle from one or two years to days. The roadmap has three phases: first assisting human designers, then fully automating the process for companies without in-house hardware teams, and finally — in phase three — closing the recursive loop, still under human supervision.

Differences vs. Other Approaches

RSI differs from standard AutoML and fine-tuning approaches in key respects. AutoML automates network architecture and hyperparameters — but humans define the goal. RLHF aligns model behavior using human-labeled data. AlphaEvolve and DGM go further: generating new algorithms and modifying agent code. Full RSI (theoretical) would require a system capable of redefining the problem space itself — including the goals and success metrics.

Applications

Algorithm optimization — AlphaEvolve discovers algorithms outperforming human solutions in mathematics, scheduling, and chip design.
Scientific research automation — AI Scientist closes the loop from hypothesis to peer-reviewed paper.
Software development — Claude Code and GPT-5.3-Codex reduce debugging and deployment time.
AI hardware design — Ricursive Intelligence aims to reduce the chip design cycle from years to days.

Limitations

Barriers remain substantial. Jeff Clune from University of British Columbia admits AI is "merely decent" at generating, implementing, and judging ideas. Dean Ball from the Foundation for American Innovation notes that AI scientists still don't match the best human scientists: "Maybe eventually they're going to automate the genius — but not next year."

Nathan Lambert from the Allen Institute for AI introduces the concept of lossy self-improvement (LSI), arguing that increasing complexity in large AI systems creates growing friction that slows the flywheel rather than accelerating it.

Full RSI would require not just designing software and chips, but building data centers, running power plants, and mining metals. Knowledge is distributed and often tacit: the capabilities of chip manufacturer TSMC emerge from the collective intelligence of 90,000 interacting employees. Meta researchers Jason Weston and Jakob Foerster propose an alternative: co-improvement — keeping humans in the loop for faster and safer progress.

Why It Matters

RSI is a central AI safety concern. David Scott Krueger from the University of Montreal, who surveyed 25 AI experts on automating AI R&D, warns that nearly all entertain the notion of an intelligence explosion, and that AI companies are likely to keep self-improving models internal and outside public oversight. His nonprofit Evitable advocates for a global pause when 99 percent of code is written by AI — a threshold he believes "we may be crossing about now."

Paradoxically, Clune — an RSI enthusiast — says he would willingly "give up his hobby to cure cancer." He also points to an evolutionary scenario: RSI may not look like one big brain growing ever larger, but rather like a Cambrian explosion of artificial life forms — diverse agents forming their own ecosystems, cultures, and economies. Human scientists will not disappear overnight; their role will evolve from low-level tasks through research direction to strategic oversight.