Abstraction and Reasoning Corpus for AGI

The only benchmark measuring "fluid intelligence" in AI – the ability to abstract and reason on entirely novel tasks based solely on core knowledge priors (shared by humans), without the ability to "buy" scores through massive training data.

Common pitfalls

Gap between training set and private test set

HIGH

Good scores on the public test set do not guarantee good performance on the private test set (ARC Prize evaluation).

Evaluate exclusively on the private set through the official ARC Prize competition.

Overfitting to known tasks

CRITICAL

Systems trained on known ARC tasks may overfit to their specific patterns without demonstrating genuine reasoning.

Use new tasks (ARC-AGI-2/3) and evaluate on the private test set.

Reference implementations

ARC-AGI – official repository (GitHub)official

Python

GENESIS · Source paper

On the Measure of Intelligence

2019arXiv 2019Francois Chollet

2019

ARC and "On the Measure of Intelligence" paper published

breakthrough

Francois Chollet defines intelligence as skill-acquisition efficiency and introduces the ARC benchmark.

2024

ARC Prize 2024 – first systems exceed 55% on private test set

breakthrough

Public Kaggle competition with $1M prize pool attracts hundreds of teams; LLM+program synthesis hybrids exceed 55%.

2025

ARC-AGI-2 and ARC-AGI-3 – new, harder versions

ARC Prize Foundation releases new benchmark versions with harder tasks as models begin saturating ARC-AGI-1.

Hardware agnosticPRIMARY

Pixel-grid benchmark; evaluation is hardware-agnostic although solver programs may leverage GPU.

Title	Publisher	Type
On the Measure of Intelligence	arXiv	scientific article
ARC Prize – official website	ARC Prize Foundation	official website
ARC-AGI GitHub Repository	GitHub	repository

On the Measure of Intelligence

scientific articlearXiv

ARC Prize – official website

official websiteARC Prize Foundation

ARC-AGI GitHub Repository

repositoryGitHub

Back to technology catalog

Abstraction and Reasoning Corpus for AGI

Use cases

How it works

Problem solved

Implementation

Common pitfalls

Reference implementations

History and evolution

Preferred hardware

Sources