GPT-4.1 is an OpenAI API model released April 14, 2025. Features a 1M token context window, 54.6% on SWE-bench Verified, and precise literal instruction following. Designed for developers building agentic coding workflows.
Context window
1M tokens
tokens
Parameters
Undisclosed
parameters
Max output
32,768
tokens
Release date
14 April 2025
Access:APIHostedDeployment:☁ Cloud
Overview
Access & deployment
APIHosted
Cloud
Weights: Closed
Key parameters
📏 Context: 1M tokens
🧩 Parameters: Undisclosed
✓ Tools · ✓ Fine-tuning
📥 Input: text, image, structured data, urls…
Technical specification
Context window
1M tokens
tokens
Parameters
Undisclosed
parameters
Max output tokens
32,768
tokens per response
Knowledge cutoff
1 Jun 2024
Knowledge boundary
License
Proprietary (OpenAI API license)
Hardware requirements
The model is not available for local deployment.
It operates exclusively through OpenAI's infrastructure and is accessible via API.
Features:✓ Tool use✓ Fine-tuning
Modalities
⬇ Input
textimagestructured_dataurlsdocuments
⬆ Output
analytical_reportscodestructured_datasummariestext
Capabilities and applications
Native model capabilities
Reasoning
Category: reasoning
Multi-step reasoning
Category: reasoning
Long context
Category: reasoning
Coding
Category: coding
Function Calling
Category: planning
Structured output
Category: structured_generation
Image understanding
Category: vision
Chart understanding
Category: vision
OCR
Category: vision
Multilingual
Category: language
Planning
Category: planning
Streaming output
Category: reasoning
Benchmark results
13 benchmarks
MMLU
accuracy · Massive Multitask Language Understanding benchmark covering 57 subject areas.
90.2%
📅 14 Apr 2025📄 RD World Online / OpenAI (prezentacja premiery)
Score reported by OpenAI during the launch livestream.
SWE-bench Verified
accuracy · Benchmark of real-world software engineering tasks sourced from GitHub. 23 out of 500 tasks that could not be executed on OpenAI infrastructure were excluded. Conservative score (with infrastructure): 52.1%.
54.6%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 (openai.com/index/gpt-4-1/)
Improvement of 21.4 pp. over GPT-4o (33.2%) and 26.6 pp. over GPT-4.5 (28.0%). Outperforms o1 and o3-mini on this benchmark. Claude 3.7 Sonnet (~62–63%) and Gemini 2.5 Pro (~64%) achieved higher scores.
MultiChallenge
accuracy · Scale AI benchmark testing instruction-following in multi-turn conversations (4 categories of information from previous messages).
38.3%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 / Scale AI
Improvement of 10.5 pp. over GPT-4o (27.8%). GPT-4.5 scored 43.8% on this benchmark.
IFEval
accuracy · Benchmark testing compliance with verifiable instructions (format, length, content, avoiding specific phrases).
87.4%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1
Improvement of 6.4 pp. over GPT-4o (81.0%). GPT-4.5 scored 88.2%.
Video-MME (long, no subtitles)
accuracy · Multiple-choice questions based on 30–60-minute video recordings without subtitles.
72.0%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1
State-of-the-art result at launch. An improvement of 6.7 pp. over GPT-4o (65.3%).
MMMU
accuracy · Multimodal academic reasoning tasks involving charts, diagrams, maps, and similar visual content.
74.8%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1
GPT-4o: 68.7%, GPT-4.5: 75.2%. Marginally lower than GPT-4.5.
MathVista
accuracy · Visual mathematical reasoning.
72.2%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 / Pankaj Rajan / Medium
GPT-4o: 61.4%, GPT-4.5: 72.3%. Comparable result to GPT-4.5 at significantly lower cost.
Aider Polyglot (diff format)
accuracy · Benchmark for code editing in diff format across multiple programming languages.
52.9%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1
Improvement of over 2.9× compared to GPT-4o (18.2%). GPT-4.5: 44.9%, o3-mini-high: 60.4%. Reduction of unnecessary edits from 9% (GPT-4o) to 2%.
OpenAI-MRCR (2-needle, 128K)
accuracy · Multi-Round Coreference – locating 2 hidden answers within a 128K-token context. GPT-4o: 31.9%, GPT-4.5: 38.5%.
57.2%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1
OpenAI open-sourced this benchmark. Performance drops from ~84% at 8K tokens to ~50% at 1M tokens (officially acknowledged degradation).
Graphwalks (BFS <128K)
accuracy · Multi-hop reasoning in long contexts (breadth-first search). GPT-4o: 41.7%, GPT-4.5: 72.3%, o1-high: 62.0%.
61.7%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1
Improvement of 19.7 pp. over GPT-4o. Performance close to o1-high, below GPT-4.5.
Needle in Haystack (1M tokens)
accuracy · Retrieving a single hidden piece of information at each position within the context window (up to 1M tokens).
100%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 / Helicone
100% precision across all positions and all context lengths.
OpenAI Internal Instruction Following
accuracy · Internal OpenAI benchmark for measuring instruction following. GPT-4o: 29%.
49%
📅 14 Apr 2025📄 TechTarget / OpenAI launch event
Approximately 20 pp. improvement over GPT-4o on an internal instruction-following benchmark.
SWE-bench Verified (conservative / infrastructure-excluded)
accuracy · SWE-bench Verified variant excluding 23 tasks that could not be executed on OpenAI infrastructure.
52.1%
📅 14 Apr 2025📄 OpenAI – oficjalny blog gpt-4-1 (przypis [2])
Result is conservative, confirmed by OpenAI as an alternative metric.
Pricing
Deployment and security
🔒 Security / Enterprise
✓ Verified enterprise information
OpenAI publishes security and enterprise documentation for its platform, covering the API and ChatGPT Enterprise/Business/Edu offerings. For GPT-4.1, security information is platform-level rather than a dedicated per-model safety sheet. Publicly documented aspects include data encryption, access controls, compliance certifications, and policies on the use of customer data for model training.
Security information for GPT-4.1 should be treated as pertaining to the OpenAI API environment and enterprise products, rather than as a model-specific security specification. In practice, this is the appropriate approach for an AI systems catalog.
Updated: 15 Mar 2026↗ Security documentation
Sources and related pages
15 sources
DocsGPT-4.1 model documentationRepoOpenAI developer platformBlogIntroducing GPT-4.1 in the API – OpenAIDocsGPT-4.1 Model – OpenAI API DocsDocsCompare models – OpenAI API (GPT-4.1 specs)DocsGPT-4.1 Prompting Guide – OpenAI CookbookDocsOpenAI PricingDocsOpenAI DeprecationsBlogRetiring GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini in ChatGPT – OpenAIWebAnnouncing GPT-4.1 for Azure AI Foundry – Microsoft Azure BlogWebOpenAI ships GPT-4.1 without a safety report – TechCrunchWebOpenAI's GPT-4.1 may be less aligned – TechCrunchWebGPT-4.1 – WikipediaRepoOpenAI MRCR – Hugging Face DatasetWebOpenAI Safety Evaluations Hub
