Other

RSI

1965ResearchPublished

Key innovation

Hypothetical process in which an AI system iteratively improves its own code, architecture, or training procedures, potentially leading to exponential capability gains.

How it works

The starting point is a "seed improver" — an initial system (typically an LLM or agent) with capabilities for planning, writing, compiling, testing, and executing code, plus a goal and validation protocols. In a loop the agent: (1) analyses its weaknesses, (2) hypothesises improvements (changes to prompts, architecture, data, reward functions, tool code), (3) implements changes in an isolated environment, (4) evaluates the new version against tests and metrics, (5) if it passes validation, the new version is adopted as the new baseline. The loop is recursive: each subsequent improver is itself the product of a previous improvement.

Problem solved

The bottleneck of human-paced AI research: RSI hypothesises that a system capable of improving itself can surpass human-speed progress on architectures, training, and optimisation.

Key mechanisms

Recursive loop: output of each iteration becomes the input of the next

Improvement hypothesis generation by the system itself (code generation, prompt rewriting)

Isolated execution environment for testing changes

Validation protocols preventing regressions

Instrumental goal: system pursues improvement of its own metrics

Self-rewarding: system generates its own reward signals

Population-based search: parallel evaluation of multiple candidates (AlphaEvolve)

Strengths & limitations

Strengths

✓Potentially exponential capability growth without proportional increase in human resources

✓Automated discovery of novel algorithms and architectures beyond human design capacity

✓Adaptable to any domain with an appropriate evaluation function

✓Reduction of R&D costs given proper safeguards

✓Empirically validated in constrained domains (Voyager, AlphaEvolve, STOP)

Limitations

✗Requires a reliable, automated evaluation function — without it the loop stalls (primary limitation of AlphaEvolve)

✗Risk of reward hacking and alignment faking (demonstrated in advanced LLMs by Anthropic, 2024)

✗No guarantee that original goals are preserved across iterations (goal drift)

✗Possibility of emergent unintended instrumental goals (e.g. self-preservation)

✗No formalised stopping criterion: when to halt the loop?

✗Difficulty interpreting a system that has been modified many times

✗Outside narrow domains (maths, code) RSI remains largely speculative today

Components

Seed improverBootstrap of the self-improvement loop

Initial codebase and base model with programming capabilities (reading, writing, compiling, testing, executing code). The starting point of the loop.

Recursive loopDriving the iteration process

Mechanism for iterative prompting and task execution where the output of one iteration becomes the input of the next.

Goal / objective functionDefining direction of improvements

Stated objective ("improve your own capabilities") together with metrics and validation protocols defining what counts as improvement.

Validation suiteProtection against regression and derailment

Set of tests and benchmarks that every new version must pass to avoid regressions or violations of safety constraints.

Modification mechanismApplying changes to itself

The system's ability to edit its own code, prompts, training data, or weights — the foundation of recursive change.

Implementation

Reference implementations

STOP: Self-Taught Optimizer

Voyager (LLM-powered Minecraft agent)

Python

Official

Implementation pitfalls

Reward hacking and alignment fakingCritical

A system may meet success metrics in ways inconsistent with intent; Anthropic's 2024 research demonstrated "alignment faking" in advanced LLMs.

Fix:Redundant evaluations, red-teaming, interpretability, and constraining the scope of modifications.

Lack of a reliable evaluation functionHigh

Without an automatic and reliable evaluation function, the loop produces apparently better but actually worse variants (a stated limitation of AlphaEvolve).

Goal drift and loss of original intentHigh

Successive iterations can gradually shift the goal representation, producing unpredictable evolution.

Instability from self-rewardingMedium

Self-rewarding models risk collapse — they improve what they consider good regardless of actual quality.

Evolution

Original paper · 1965 · Advances in Computers, Vol. 6 · Irving John Good

Speculations Concerning the First Ultraintelligent Machine

Irving John Good

1965

Concept of the "intelligence explosion" (I. J. Good)

Inflection point

In "Speculations Concerning the First Ultraintelligent Machine" Good describes a machine that can design ever-better versions of itself, producing an intelligence explosion.

2007

Seed AI — Eliezer Yudkowsky

Yudkowsky ("Levels of Organization in General Intelligence") formalises "seed AI" as an early system able to recursively improve its own cognitive architecture.

Levels of Organization in General Intelligence (paper)

2014

Superintelligence (Nick Bostrom)

Inflection point

Bostrom's "Superintelligence: Paths, Dangers, Strategies" systematises analysis of RSI and fast-takeoff intelligence-explosion scenarios.

2022

LLMs Can Self-Improve (Huang et al.)

Google Research paper showing that a large LLM can improve its own reasoning ability using self-generated CoT solutions — an early empirical signal of self-improvement in LLMs.

Large Language Models Can Self-Improve (paper)

2023

Voyager — LLM-driven agent in Minecraft

Voyager iteratively writes code, debugs, and grows a skill library, demonstrating continuous self-improvement in a task environment.

2023

STOP — Self-Taught Optimizer

Zelikman et al. propose a scaffolding program that recursively improves itself using a fixed LLM as the engine.

Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation (paper)

2024

Self-Rewarding Language Models (Meta)

Meta AI demonstrates LLMs able to generate their own reward signals during training, opening the path to super-human feedback loops.

Self-Rewarding Language Models (paper)

2025

AlphaEvolve (Google DeepMind)

Inflection point

LLM-driven evolutionary coding agent that designs and optimises algorithms; made algorithmic discoveries and could in principle improve its own components.