Multimodal

Spatial Intelligence

2024ActivePublished

Key innovation

A shift from flat image- and text-based machine intelligence to perception, reasoning, generation and interaction in three-dimensional space - uniting computer vision, language models, 3D graphics and embodied control under a single world-models paradigm.

How it works

Spatial intelligence is realized through world models that learn 3D representations from multimodal data - images, video, depth sensors, text descriptions and interaction logs. A typical pipeline combines: (1) 3D perception (NeRF, Gaussian Splatting, depth models) recovering geometry from 2D inputs, (2) a world representation as a latent space or explicit 3D mesh, (3) reasoning and dynamics prediction over that representation using transformers or diffusion models, and (4) action through generated images, 3D scenes or robot policies (Vision-Language-Action). These models are trained on large video and embodied datasets so they capture both appearance and physics of the world.

Problem solved

Classical AI models handle text and 2D images well but struggle with the three-dimensional structure of the world, physics, scene geometry and the consequences of physical action. Spatial intelligence addresses this gap by giving machines 3D representations sufficient for reasoning, planning and acting in space - a prerequisite for general-purpose robotics, immersive environments and generative 3D graphics.

Evolution

1983

Howard Gardner - theory of multiple intelligences

The term 'spatial intelligence' originates in cognitive psychology as one of Gardner's multiple intelligences.

2020

NeRF (Neural Radiance Fields)

Inflection point

Mildenhall et al. publish NeRF - a breakthrough in neural 3D reconstruction from 2D images. Ben Mildenhall later co-founds World Labs.

2023

3D Gaussian Splatting

Kerbl et al. introduce fast, photorealistic 3D representation that becomes critical for scalable spatial perception.

2024

Fei-Fei Li's TED Talk and the founding of World Labs

Inflection point

In April 2024 Fei-Fei Li gives the TED Talk 'With Spatial Intelligence, AI Will Understand the Real World'. In September 2024 she unveils World Labs as a spatial intelligence company, canonizing the term within the AI industry.

2024

Google DeepMind Genie 1/2

DeepMind unveils generative interactive world models as a parallel realisation of the spatial intelligence paradigm.

2025

World Labs Marble

World Labs ships Marble - a product generating spatially coherent, persistent 3D worlds from a single image, video or text prompt.

Sources

With spatial intelligence, AI will understand the real world