GPT-4.1

gpt-4.1-2025-04-14 · Family: GPT

GPT-4.1 is an OpenAI API model released April 14, 2025. Features a 1M token context window, 54.6% on SWE-bench Verified, and precise literal instruction following. Designed for developers building agentic coding workflows.

✓ Active✓ Public accessLLMMultimodalTool-using model📁 GPT

Context window

1M tokens

tokens

Parameters

Undisclosed

parameters

Max output

32,768

tokens

Release date

14 April 2025

🏢OpenAIProducer

Access:APIHostedDeployment:☁ Cloud

Overview

GPT-4.1 is an OpenAI language model released on April 14, 2025, available exclusively via API (not in ChatGPT at launch). API snapshot: gpt-4.1-2025-04-14. It was designed for developers building agentic coding systems, with emphasis on instruction following and long-context tasks.

Key features

Context window of 1,047,576 tokens (1M); maximum output tokens: 32,768. Knowledge cutoff: June 2024. Supports tool use, fine-tuning, and multimodal input (text, image, documents).

Benchmark results

On SWE-bench Verified, it achieves 54.6% (conservative score 52.1%) — an improvement of 21.4 pp. over GPT-4o (33.2%). On Aider Polyglot diff, it scores 52.9% (2.9× better than GPT-4o). MMLU 90.2%, MMMU 74.8%, MathVista 72.2%, Video-MME (long) 72.0%. On the Needle in Haystack test at 1M token context — 100% recall.

Pricing and availability

Closed-weights model, available through the OpenAI API, Azure AI Foundry, and other hosting platforms. Pricing: $2/MTok input, $8/MTok output, cached input $0.50/MTok (75% discount). Batch API with 50% discount. No price premium for long context up to 1M tokens. Model retired from ChatGPT on February 13, 2026; still available via API.

Safety

OpenAI did not publish a separate system card, classifying the model as non-frontier. Independent research (Owain Evans/Oxford, ICML 2025; SplxAI) identified elevated misalignment risk after fine-tuning on unsafe code, as well as a tendency toward literal, more easily circumvented instruction following. In response, OpenAI published a dedicated prompting guide.

Classification

LLMMultimodalTool-using model

Family: GPT

Applications

Coding Document analysis Chatbots Content generation Data analysis Summarization Translation

Access & deployment

APIHosted

Cloud

Weights: Closed

Key parameters

📏 Context: 1M tokens

🧩 Parameters: Undisclosed

✓ Tools · ✓ Fine-tuning

📥 Input: text, image, structured data, urls…

Technical specification

Context window

1M tokens

tokens

Parameters

Undisclosed

parameters

Max output tokens

32,768

tokens per response

Knowledge cutoff

1 Jun 2024

Knowledge boundary

License

Proprietary (OpenAI API license)

Hardware requirements

The model is not available for local deployment. It operates exclusively through OpenAI's infrastructure and is accessible via API.

Features:✓ Tool use✓ Fine-tuning

Modalities

⬇ Input

textimagestructured_dataurlsdocuments

⬆ Output

analytical_reportscodestructured_datasummariestext

Capabilities and applications

Native model capabilities

Reasoning

The model's ability to reason logically and solve complex problems.

Category: reasoning

Multi-step reasoning

Carrying out multi-step chains of reasoning across long, complex tasks.

Category: reasoning

Long context

Maintaining coherence and focus across very long input context.

Category: language

Coding

Generating, analysing and modifying source code.

Category: coding

Function Calling

Category: planning

Structured output

Producing data in structured formats such as JSON.

Category: structured_generation

Image understanding

Analysing and interpreting the content of images.

Category: vision

Chart understanding

Reading and interpreting charts, tables and diagrams.

Category: vision

OCR

Recognising text within images and documents.

Category: vision

Multilingual

Understanding and generating text in many languages.

Category: language

Planning

Forming and executing action plans for complex tasks.

Category: planning

Streaming output

Category: reasoning

Application domains

Coding Document analysis Chatbots Content generation Data analysis Summarization Translation

Benchmark results

13 benchmarks

MMLU

accuracy · Massive Multitask Language Understanding benchmark covering 57 subject areas.

90.2%

📅 14 Apr 2025📄 RD World Online / OpenAI (prezentacja premiery)

Score reported by OpenAI during the launch livestream.

SWE-bench Verified

accuracy · Benchmark of real-world software engineering tasks sourced from GitHub. 23 out of 500 tasks that could not be executed on OpenAI infrastructure were excluded. Conservative score (with infrastructure): 52.1%.

54.6%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 (openai.com/index/gpt-4-1/)

Improvement of 21.4 pp. over GPT-4o (33.2%) and 26.6 pp. over GPT-4.5 (28.0%). Outperforms o1 and o3-mini on this benchmark. Claude 3.7 Sonnet (~62–63%) and Gemini 2.5 Pro (~64%) achieved higher scores.

MultiChallenge

accuracy · Scale AI benchmark testing instruction-following in multi-turn conversations (4 categories of information from previous messages).

38.3%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 / Scale AI

Improvement of 10.5 pp. over GPT-4o (27.8%). GPT-4.5 scored 43.8% on this benchmark.

IFEval

accuracy · Benchmark testing compliance with verifiable instructions (format, length, content, avoiding specific phrases).

87.4%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1

Improvement of 6.4 pp. over GPT-4o (81.0%). GPT-4.5 scored 88.2%.

Video-MME (long, no subtitles)

accuracy · Multiple-choice questions based on 30–60-minute video recordings without subtitles.

72.0%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1

State-of-the-art result at launch. An improvement of 6.7 pp. over GPT-4o (65.3%).

MMMU

accuracy · Multimodal academic reasoning tasks involving charts, diagrams, maps, and similar visual content.

74.8%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1

GPT-4o: 68.7%, GPT-4.5: 75.2%. Marginally lower than GPT-4.5.

MathVista

accuracy · Visual mathematical reasoning.

72.2%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 / Pankaj Rajan / Medium

GPT-4o: 61.4%, GPT-4.5: 72.3%. Comparable result to GPT-4.5 at significantly lower cost.

Aider Polyglot (diff format)

accuracy · Benchmark for code editing in diff format across multiple programming languages.

52.9%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1

Improvement of over 2.9× compared to GPT-4o (18.2%). GPT-4.5: 44.9%, o3-mini-high: 60.4%. Reduction of unnecessary edits from 9% (GPT-4o) to 2%.

OpenAI-MRCR (2-needle, 128K)

accuracy · Multi-Round Coreference – locating 2 hidden answers within a 128K-token context. GPT-4o: 31.9%, GPT-4.5: 38.5%.

57.2%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1

OpenAI open-sourced this benchmark. Performance drops from ~84% at 8K tokens to ~50% at 1M tokens (officially acknowledged degradation).

Graphwalks (BFS <128K)

accuracy · Multi-hop reasoning in long contexts (breadth-first search). GPT-4o: 41.7%, GPT-4.5: 72.3%, o1-high: 62.0%.

61.7%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1

Improvement of 19.7 pp. over GPT-4o. Performance close to o1-high, below GPT-4.5.

Needle in Haystack (1M tokens)

accuracy · Retrieving a single hidden piece of information at each position within the context window (up to 1M tokens).

100%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 / Helicone

100% precision across all positions and all context lengths.

OpenAI Internal Instruction Following

accuracy · Internal OpenAI benchmark for measuring instruction following. GPT-4o: 29%.

49%

📅 14 Apr 2025📄 TechTarget / OpenAI launch event

Approximately 20 pp. improvement over GPT-4o on an internal instruction-following benchmark.

SWE-bench Verified (conservative / infrastructure-excluded)

accuracy · SWE-bench Verified variant excluding 23 tasks that could not be executed on OpenAI infrastructure.

52.1%

📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 (przypis [2])

Result is conservative, confirmed by OpenAI as an alternative metric.

Pricing

Deployment and security

🔒 Security / Enterprise

✓ Verified enterprise information

OpenAI publishes security and enterprise documentation for its platform, covering the API and ChatGPT Enterprise/Business/Edu offerings. For GPT-4.1, security information is platform-level rather than a dedicated per-model safety sheet. Publicly documented aspects include data encryption, access controls, compliance certifications, and policies on the use of customer data for model training.

Security information for GPT-4.1 should be treated as pertaining to the OpenAI API environment and enterprise products, rather than as a model-specific security specification. In practice, this is the appropriate approach for an AI systems catalog.

Updated: 15 Mar 2026↗ Security documentation

Sources and related pages

15 sources

DocsGPT-4.1 model documentationplatform.openai.com RepoOpenAI developer platformplatform.openai.com BlogIntroducing GPT-4.1 in the API – OpenAIopenai.com DocsGPT-4.1 Model – OpenAI API Docsplatform.openai.com DocsCompare models – OpenAI API (GPT-4.1 specs)platform.openai.com DocsGPT-4.1 Prompting Guide – OpenAI Cookbookdevelopers.openai.com DocsOpenAI Pricingplatform.openai.com DocsOpenAI Deprecationsplatform.openai.com BlogRetiring GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini in ChatGPT – OpenAIopenai.com WebAnnouncing GPT-4.1 for Azure AI Foundry – Microsoft Azure Blogazure.microsoft.com WebOpenAI ships GPT-4.1 without a safety report – TechCrunchtechcrunch.com WebOpenAI's GPT-4.1 may be less aligned – TechCrunchtechcrunch.com WebGPT-4.1 – Wikipediaen.wikipedia.org RepoOpenAI MRCR – Hugging Face Datasethuggingface.co WebOpenAI Safety Evaluations Hubopenai.com

Browse related topics

📁 GPT 🌐 Coding 🌐 Document analysis 🌐 Chatbots 🌐 Content generation All llm models All multimodal model models