Robots Atlas>ROBOTS ATLAS
Sora

Sora

1ย ยทย Family: Sora
OpenAI text-to-video diffusion-transformer model. Generates clips up to 60 seconds in 1080p from a text prompt, an image or another video.
โœ“ Activeโœ“ Public accessVideo generation๐Ÿ“ Sora
Release date
15 February 2024
Access:HostedDeployment:โ˜ Cloud

Overview

Sora is a text-to-video generative model developed by OpenAI, announced on 15 February 2024 in the technical report "Video generation models as world simulators". The model was publicly released on 9 December 2024 as the Sora Turbo variant, available to ChatGPT Plus and Pro subscribers via sora.com.

Architecture

Sora is a diffusion transformer (DiT). Videos and images are represented as collections of spacetime patches, analogously to tokens in large language models. The model is trained in latent space (latent diffusion) and generates video through iterative denoising. The architecture is scalable โ€” more compute translates into higher quality, longer and more consistent shots.

Capabilities

Sora generates clips up to 60 seconds long at resolutions up to 1080p, in multiple aspect ratios (such as 1:1, 16:9, 9:16). It supports three basic scenarios: text-to-video (video from a description), image-to-video (animation of an input image) and video-to-video (extending, blending and remixing existing clips). The model exhibits an advanced understanding of camera motion, multiple characters, physics and visual language.

Availability

Sora is available as a hosted product at sora.com and inside the ChatGPT app for Plus and Pro plan subscribers (with daily generation quotas and length / resolution caps that depend on the plan). The model weights are not publicly released. Generations are tagged with C2PA metadata and watermarks to indicate AI provenance.

Successor

On 30 September 2025 OpenAI announced Sora 2 โ€” a next-generation model with improved physics, controllability and synchronised audio generation. Sora 2 is a separate model; this entry covers the Sora line in its first-generation variant (Sora 1 / Sora Turbo).

Classification
Video generation
Family: Sora
Access & deployment
Hosted
Cloud
Weights: Closed
Key parameters
๐Ÿ“ฅ Input: text, image, video

Technical specification

Max output tokens
0
tokens per response
Modalities
โฌ‡ Input
textimagevideo
โฌ† Output
video

Capabilities and applications

Native model capabilities
Video generation
The model's ability to generate video clips from a text prompt, image or another video, with control over length, resolution and visual characteristics.
Category: video
Image-to-video
The model's ability to animate a static input image โ€” extending it in time into a consistent video clip according to a description of motion or action.
Category: video
Video understanding
The model's ability to analyse and interpret video content โ€” recognising actions, motion, events and relationships between objects over time.
Category: video

Technical architecture