Robots Atlas>ROBOTS ATLAS
GPT Realtime 2
AI Modelsโ€บGPT

GPT Realtime 2

2ย ยทย Family: GPT
OpenAI's voice model with GPT-5-class reasoning, parallel tool calls and a 128K-token context window, available via the Realtime API.
โœ“ Activeโœ“ Public accessAudioAudioMultimodalReasoning model๐Ÿ“ GPT
Context window
128K
tokens
Release date
7 May 2026
Access:APIDeployment:โ˜ Cloud

Overview

GPT-Realtime-2 is a next-generation audio model released by OpenAI on May 7, 2026, as part of the Realtime API. It combines GPT-5-class reasoning, parallel tool calls, and a context window expanded to 128K tokens (up from 32K in the previous version). A new "preamble" feature lets the model speak short acknowledgement phrases ("let me check that", "one moment") before generating a full response, along with audible announcements of tool calls in progress.

On OpenAI benchmarks, GPT-Realtime-2 (high) scores 15.2% higher than its predecessor GPT-Realtime-1.5 on Big Bench Audio (audio reasoning) and 13.8% higher on Audio MultiChallenge (multi-turn conversation). Early tester Zillow reported a 26-point increase in call success rate (95% vs. 69%) after prompt optimization. The model is accessible via WebRTC, WebSocket, and SIP, with full EU Data Residency support.

Classification
AudioAudioMultimodalReasoning model
Family: GPT
Access & deployment
API
Cloud
Weights: Closed
Key parameters
๐Ÿ“ Context: 128K
โœ“ Tools
๐Ÿ“ฅ Input: audio, text

Technical specification

Context window
128K
tokens
Features:โœ“ Tool use
Modalities
โฌ‡ Input
audiotext
โฌ† Output
audiotext

Capabilities and applications

Native model capabilities
Audio understanding
Category: audio
Voice Conversation
Ability to conduct multi-turn real-time voice conversations with context retention and natural speech pacing.
Category: speech
Live Translation
Real-time speech translation between multiple languages without interrupting the audio stream.
Category: speech
Streaming Speech-to-Text
Real-time conversion of speech to text with immediate output as the speaker is talking.
Category: speech
Parallel Tool Calls
Ability to invoke multiple external tools simultaneously while generating a response.
Category: reasoning

Benchmark results

2 benchmarks
Big Bench Audio
relative improvement ยท GPT-Realtime-2 (high)
+15.2% vs GPT-Realtime-1.5%
๐Ÿ“„ OpenAI
Audio MultiChallenge
relative improvement ยท GPT-Realtime-2 (xhigh)
+13.8% vs GPT-Realtime-1.5%
๐Ÿ“„ OpenAI

Technical architecture