GPT-5.3-Codex-Spark

5.3-Codex-Spark · Family: GPT

A smaller, ultra-fast variant of GPT-5.3-Codex designed for real-time interactive coding. Runs on Cerebras WSE-3 generating over 1000 tokens/second.

⏳ Preview⏳ Limited accessLLMTool-using model📁 GPT

Context window

128K

tokens

Release date

12 February 2026

🏢OpenAIProducer 🤝CerebrasTechnology partner

Access:APIHostedDeployment:☁ Cloud

Overview

GPT-5.3-Codex-Spark is a research preview of OpenAI's ultra-fast coding model announced on 12 February 2026. It is a smaller variant of the flagship GPT-5.3-Codex and OpenAI's first model specifically designed for real-time work with the Codex coding agent. The model is served on the Cerebras Wafer-Scale Engine 3 accelerator, allowing it to generate over 1000 tokens per second — a fundamental change in how AI model interaction feels.

Technical profile

Codex-Spark has a 128k context window and is text-only at launch. It is tuned for making precise, minimal code edits — it does not automatically run tests unless asked. The model is steerable in real time — you can interrupt it, redirect it, ask questions and see results almost instantly.

Benchmarks

According to OpenAI, Codex-Spark scores stronger than GPT-5.1-Codex-mini on SWE-Bench Pro and Terminal-Bench 2.0 while finishing the same tasks in a fraction of the time required by GPT-5.3-Codex. Codex-Spark is the first member of an ultra-fast model family from OpenAI.

Availability

Codex-Spark is available as a research preview for ChatGPT Pro users in the Codex app, CLI and VS Code extension. API access is rolling out to a small set of design partners. The model runs on a separate ultra-low-latency path (Cerebras) with its own rate limit that does not count against standard Codex limits.

Partnership with Cerebras

Codex-Spark is the first outcome of OpenAI's partnership with Cerebras Systems, announced in January 2026. Sean Lie, CTO and co-founder of Cerebras, described the collaboration as an attempt to discover what new interaction patterns, use cases and fundamentally different model experiences become possible with fast inference. GPUs remain foundational to OpenAI's training and the bulk of its inference — Cerebras complements that architecture with a low-latency tier for workflows where responsiveness matters as much as intelligence.

Latency optimizations

Together with the Codex-Spark release, OpenAI rolled out end-to-end latency improvements across the entire Codex serving stack — a persistent WebSocket connection, optimizations inside the Responses API and a rewritten response-streaming path. The combined effect: −80% per-roundtrip overhead, −30% per-token overhead and −50% time-to-first-token. These improvements will gradually be made the default for all Codex models.

Classification

LLMTool-using model

Family: GPT

Applications

Coding Workflow automation

Access & deployment

APIHosted

Cloud

Weights: Closed

Key parameters

📏 Context: 128K

✓ Tools

📥 Input: text

Technical specification

Context window

128K

tokens

License

Proprietary (OpenAI Terms of Use)

Hardware requirements

Requires ultra-low-latency inference hardware: served by OpenAI on the Cerebras Wafer-Scale Engine 3 (CS-3); not available for local deployment.

Features:✓ Tool use

Modalities

⬇ Input

text

⬆ Output

textcode

Capabilities and applications

Native model capabilities

Coding

Generating, analysing and modifying source code.

Category: coding

Real-time inference

The model's ability to generate responses with very low latency (>1000 tokens/sec) on specialized inference hardware (e.g. Cerebras WSE), enabling interactive, turn-by-turn collaboration with a human.

Category: coding

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Agentic capability

The model's ability to autonomously plan and execute multi-step tasks by sequentially using tools, maintaining context, and adapting to intermediate results.

Category: planning

Function Calling

Category: planning

Computer use

The model's ability to operate a computer interface by interpreting screenshots and generating actions such as clicks, typing, and navigating applications.

Category: planning

Long context

Maintaining coherence and focus across very long input context.

Category: language

Planning

Forming and executing action plans for complex tasks.

Category: planning

Application domains

Coding Workflow automation

Benchmark results

2 benchmarks

SWE-Bench Pro

pass@1 · Agentic software engineering, multi-language (Python/Java/JS/Go), Codex harness.

stronger than GPT-5.1-Codex-mini%

📅 12 Feb 2026📄 OpenAI announcement (Feb 12, 2026) — Introducing GPT-5.3-Codex-Spark

OpenAI did not disclose the exact pass@1 number for Codex-Spark — the announcement only compares it to GPT-5.1-Codex-mini and stresses that Codex-Spark completes tasks in a fraction of the time of GPT-5.3-Codex.

Terminal-Bench 2.0

task completion rate · Agentic terminal work (shell commands, debugging, tests).

stronger than GPT-5.1-Codex-mini%

📅 12 Feb 2026📄 OpenAI announcement (Feb 12, 2026) — Introducing GPT-5.3-Codex-Spark

Codex-Spark scores higher than GPT-5.1-Codex-mini with a significantly smaller time budget than GPT-5.3-Codex.

Technical architecture

Core Architecture

TRTransformer

Model Form

LLLLM TLTool-augmented LLM

Training Techniques

ITInstruction Tuning RRReasoning RL

Sources and related pages

4 sources

BlogIntroducing GPT-5.3-Codex-Spark — OpenAIopenai.com BlogIntroducing OpenAI GPT-5.3-Codex-Spark Powered by Cerebrascerebras.ai BlogIntroducing GPT-5.3-Codex — OpenAIopenai.com DocsCerebras Wafer-Scale Engine 3cerebras.ai

Browse related topics

📁 GPT 🌐 Coding 🌐 Workflow automation 🧠 Transformer 🧠 LLM 🧠 Tool-augmented LLM All llm models All tool using model models