OpenAI o3

OpenAI reasoning model released April 16, 2025 with full tool access in ChatGPT, ability to think with images, and a 200K context window. Succeeded by GPT-5.

✓ Active✓ Public accessReasoning modelMultimodalLLM📁 OpenAI o-series

Context window

200K

tokens

Max output

100,000

tokens

Release date

16 April 2025

🏢OpenAIProducer

Access:APIHostedDeployment:☁ Cloud

Overview

OpenAI o3 is a reasoning model in the o-series, released on April 16, 2025 alongside o4-mini. It was the first o-series model with full agentic access to every tool inside ChatGPT — web search, Python interpreter, image generation, and file analysis — and it was trained via reinforcement learning to decide when and how to use them. The model also introduced "thinking with images": images become part of the chain of thought and can be manipulated (rotated, zoomed) during reasoning. In the API, o3 has a 200,000-token context window, 100,000-token max output, and a June 1, 2024 knowledge cutoff. The API identifier is o3 (snapshot o3-2025-04-16). Pricing: USD 2 per 1M input tokens (USD 0.50 cached) and USD 8 per 1M output tokens. The model has been succeeded by GPT-5 but remains available via the API. An OpenAI o3-pro variant was also released in June 2025.

Classification

Reasoning modelMultimodalLLM

Family: OpenAI o-series

Access & deployment

APIHosted

Cloud

Weights: Closed

Key parameters

📏 Context: 200K

✓ Tools

📥 Input: text, image

Platforms

OpenAI API Microsoft Azure AI Foundry

Technical specification

Context window

200K

tokens

Max output tokens

100,000

tokens per response

Knowledge cutoff

1 Jun 2024

Knowledge boundary

Features:✓ Tool use

Modalities

⬇ Input

textimage

⬆ Output

textcode

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Coding

Generating, analysing and modifying code in many programming languages. Covers writing functions, debugging, refactoring, code review, and creating tests. Measured by benchmarks such as HumanEval and SWE-bench.

Category: coding

Long context

Support for large context windows — tens to hundreds of thousands (or millions) of input tokens. Enables analysis of entire codebases, long documents, and many parallel conversations without losing earlier information. GPT-5.1 supports 400,000 tokens.

Category: language

Multilingual

Competence in many natural languages (from a few to over a hundred): understanding, generation, translation, and code-switching within a single conversation. Frontier models support a wide range of languages with comparable quality.

Category: language

Image understanding

Analysing and interpreting the content of images.

Category: vision

Multimodal understanding

Category: multimodal

Function Calling

Category: planning

Parallel Tool Calls

Ability to invoke multiple external tools simultaneously while generating a response.

Category: reasoning

Planning

Forming and executing action plans for complex tasks.

Category: planning

Agentic capability

The model's ability to autonomously plan and execute multi-step tasks by sequentially using tools, maintaining context, and adapting to intermediate results.

Category: planning

Computer use

The model's ability to operate a computer interface by interpreting screenshots and generating actions such as clicks, typing, and navigating applications.

Category: planning

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Benchmark results

6 benchmarks

Codeforces

ELO rating · High reasoning effort, with tools

2727points