Sora

1 · Family: Sora

OpenAI text-to-video diffusion-transformer model. Generates clips up to 60 seconds in 1080p from a text prompt, an image or another video.

✓ Active✓ Public accessVideo generation📁 Sora

Release date

15 February 2024

🏢OpenAIProducer

Access:HostedDeployment:☁ Cloud

Overview

Sora is a text-to-video generative model developed by OpenAI, announced on 15 February 2024 in the technical report "Video generation models as world simulators". The model was publicly released on 9 December 2024 as the Sora Turbo variant, available to ChatGPT Plus and Pro subscribers via sora.com.

Architecture

Sora is a diffusion transformer (DiT). Videos and images are represented as collections of spacetime patches, analogously to tokens in large language models. The model is trained in latent space (latent diffusion) and generates video through iterative denoising. The architecture is scalable — more compute translates into higher quality, longer and more consistent shots.

Capabilities

Sora generates clips up to 60 seconds long at resolutions up to 1080p, in multiple aspect ratios (such as 1:1, 16:9, 9:16). It supports three basic scenarios: text-to-video (video from a description), image-to-video (animation of an input image) and video-to-video (extending, blending and remixing existing clips). The model exhibits an advanced understanding of camera motion, multiple characters, physics and visual language.

Availability

Sora is available as a hosted product at sora.com and inside the ChatGPT app for Plus and Pro plan subscribers (with daily generation quotas and length / resolution caps that depend on the plan). The model weights are not publicly released. Generations are tagged with C2PA metadata and watermarks to indicate AI provenance.

Successor

On 30 September 2025 OpenAI announced Sora 2 — a next-generation model with improved physics, controllability and synchronised audio generation. Sora 2 is a separate model; this entry covers the Sora line in its first-generation variant (Sora 1 / Sora Turbo).

Classification

Video generation

Family: Sora

Access & deployment

Hosted

Cloud

Weights: Closed

Key parameters

📥 Input: text, image, video

Technical specification

Max output tokens

tokens per response

Modalities

⬇ Input

textimagevideo

⬆ Output

video

Capabilities and applications

Native model capabilities

Video generation

The model's ability to generate video clips from a text prompt, image or another video, with control over length, resolution and visual characteristics.

Category: video

Image-to-video

The model's ability to animate a static input image — extending it in time into a consistent video clip according to a description of motion or action.

Category: video

Video understanding

The model's ability to analyse and interpret video content — recognising actions, motion, events and relationships between objects over time.

Category: video

Technical architecture

Core Architecture

DMDiffusion Model LDLDM TRTransformer

Model Form

WMWorld Models

Sources and related pages

4 sources

WebSora — OpenAIopenai.com ReportVideo generation models as world simulators (OpenAI, Feb 15, 2024)openai.com BlogSora is here — Sora Turbo public release (OpenAI, Dec 9, 2024)openai.com WebSora app (sora.com)sora.com

Browse related topics

📁 Sora 🧠 Diffusion Model 🧠 LDM 🧠 Transformer All video generation model models