Robots Atlas>ROBOTS ATLAS
GPT-5.3-Codex-Spark
AI Modelsโ€บGPT

GPT-5.3-Codex-Spark

5.3-Codex-Sparkย ยทย Family: GPT
A smaller, ultra-fast variant of GPT-5.3-Codex designed for real-time interactive coding. Runs on Cerebras WSE-3 generating over 1000 tokens/second.
โณ Previewโณ Limited accessLLMTool-using model๐Ÿ“ GPT
Context window
128K
tokens
Release date
12 February 2026
Access:APIHostedDeployment:โ˜ Cloud

Overview

GPT-5.3-Codex-Spark is a research preview of OpenAI's ultra-fast coding model announced on 12 February 2026. It is a smaller variant of the flagship GPT-5.3-Codex and OpenAI's first model specifically designed for real-time work with the Codex coding agent. The model is served on the Cerebras Wafer-Scale Engine 3 accelerator, allowing it to generate over 1000 tokens per second โ€” a fundamental change in how AI model interaction feels.

Technical profile

Codex-Spark has a 128k context window and is text-only at launch. It is tuned for making precise, minimal code edits โ€” it does not automatically run tests unless asked. The model is steerable in real time โ€” you can interrupt it, redirect it, ask questions and see results almost instantly.

Benchmarks

According to OpenAI, Codex-Spark scores stronger than GPT-5.1-Codex-mini on SWE-Bench Pro and Terminal-Bench 2.0 while finishing the same tasks in a fraction of the time required by GPT-5.3-Codex. Codex-Spark is the first member of an ultra-fast model family from OpenAI.

Availability

Codex-Spark is available as a research preview for ChatGPT Pro users in the Codex app, CLI and VS Code extension. API access is rolling out to a small set of design partners. The model runs on a separate ultra-low-latency path (Cerebras) with its own rate limit that does not count against standard Codex limits.

Partnership with Cerebras

Codex-Spark is the first outcome of OpenAI's partnership with Cerebras Systems, announced in January 2026. Sean Lie, CTO and co-founder of Cerebras, described the collaboration as an attempt to discover what new interaction patterns, use cases and fundamentally different model experiences become possible with fast inference. GPUs remain foundational to OpenAI's training and the bulk of its inference โ€” Cerebras complements that architecture with a low-latency tier for workflows where responsiveness matters as much as intelligence.

Latency optimizations

Together with the Codex-Spark release, OpenAI rolled out end-to-end latency improvements across the entire Codex serving stack โ€” a persistent WebSocket connection, optimizations inside the Responses API and a rewritten response-streaming path. The combined effect: โˆ’80% per-roundtrip overhead, โˆ’30% per-token overhead and โˆ’50% time-to-first-token. These improvements will gradually be made the default for all Codex models.

Classification
LLMTool-using model
Family: GPT
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
๐Ÿ“ Context: 128K
โœ“ Tools
๐Ÿ“ฅ Input: text

Technical specification

Context window
128K
tokens
License
Proprietary (OpenAI Terms of Use)
Hardware requirements
Requires ultra-low-latency inference hardware: served by OpenAI on the Cerebras Wafer-Scale Engine 3 (CS-3); not available for local deployment.
Features:โœ“ Tool use
Modalities
โฌ‡ Input
text
โฌ† Output
textcode

Capabilities and applications

Native model capabilities
Coding
Generating, analysing and modifying source code.
Category: coding
Real-time inference
The model's ability to generate responses with very low latency (>1000 tokens/sec) on specialized inference hardware (e.g. Cerebras WSE), enabling interactive, turn-by-turn collaboration with a human.
Category: coding
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Agentic capability
The model's ability to autonomously plan and execute multi-step tasks by sequentially using tools, maintaining context, and adapting to intermediate results.
Category: planning
Function Calling
Category: planning
Computer use
The model's ability to operate a computer interface by interpreting screenshots and generating actions such as clicks, typing, and navigating applications.
Category: planning
Long context
Maintaining coherence and focus across very long input context.
Category: language
Planning
Forming and executing action plans for complex tasks.
Category: planning
Application domains

Benchmark results

2 benchmarks
SWE-Bench Pro
pass@1 ยท Agentic software engineering, multi-language (Python/Java/JS/Go), Codex harness.
stronger than GPT-5.1-Codex-mini%
๐Ÿ“… 12 Feb 2026๐Ÿ“„ OpenAI announcement (Feb 12, 2026) โ€” Introducing GPT-5.3-Codex-Spark
OpenAI did not disclose the exact pass@1 number for Codex-Spark โ€” the announcement only compares it to GPT-5.1-Codex-mini and stresses that Codex-Spark completes tasks in a fraction of the time of GPT-5.3-Codex.
Terminal-Bench 2.0
task completion rate ยท Agentic terminal work (shell commands, debugging, tests).
stronger than GPT-5.1-Codex-mini%
๐Ÿ“… 12 Feb 2026๐Ÿ“„ OpenAI announcement (Feb 12, 2026) โ€” Introducing GPT-5.3-Codex-Spark
Codex-Spark scores higher than GPT-5.1-Codex-mini with a significantly smaller time budget than GPT-5.3-Codex.

Technical architecture