OpenAI's voice model with GPT-5-class reasoning, parallel tool calls and a 128K-token context window, available via the Realtime API.
Context window
128K
tokens
Release date
7 May 2026
Access:APIDeployment:โ Cloud
Overview
Access & deployment
API
Cloud
Weights: Closed
Key parameters
๐ Context: 128K
โ Tools
๐ฅ Input: audio, text
Technical specification
Context window
128K
tokens
Features:โ Tool use
Modalities
โฌ Input
audiotext
โฌ Output
audiotext
Capabilities and applications
Native model capabilities
Audio understanding
Category: audio
Voice Conversation
Ability to conduct multi-turn real-time voice conversations with context retention and natural speech pacing.
Category: speech
Live Translation
Real-time speech translation between multiple languages without interrupting the audio stream.
Category: speech
Streaming Speech-to-Text
Real-time conversion of speech to text with immediate output as the speaker is talking.
Category: speech
Parallel Tool Calls
Ability to invoke multiple external tools simultaneously while generating a response.
Category: reasoning
Benchmark results
2 benchmarks
Big Bench Audio
relative improvement ยท GPT-Realtime-2 (high)
+15.2% vs GPT-Realtime-1.5%
๐ OpenAI
Audio MultiChallenge
relative improvement ยท GPT-Realtime-2 (xhigh)
+13.8% vs GPT-Realtime-1.5%
๐ OpenAI
Technical architecture
Core Architecture
