OpenAI and Broadcom unveil Jalapeño — a custom inference chip built for LLMs

OpenAI and Broadcom jointly announced on June 24, 2026 their first custom processor designed exclusively for large language model inference. The chip, called Jalapeño, is built as an Application-Specific Integrated Circuit (ASIC) tailored to the architecture of modern LLMs. Its development took just nine months — roughly the time it typically takes to complete a single design review cycle in conventional semiconductor development.

Key takeaways

Jalapeño is an ASIC built solely for LLM inference, not a general-purpose GPU
Development timeline: nine months from concept to fabrication readiness
Estimated inference cost reduction: approximately 50% versus current alternatives
OpenAI used its own models to accelerate parts of the chip design process
Testing on GPT-5.3-Codex-Spark is already underway in a near-production environment

Nine months from schematics to silicon

A standard commercial chip development cycle runs two to five years. OpenAI and Broadcom compressed that to nine months — the partnership was only publicly announced in October 2025, yet by June 2026 the chip was in production testing. Both companies attribute this pace to deep software-hardware co-development: the chip architecture was shaped by direct knowledge of the workloads OpenAI runs daily across ChatGPT, Codex, and the API.

OpenAI's own models played a key role. Prior-generation models participated in automating parts of the design process, shortening the iterations between concept and finished silicon. An OpenAI spokesperson declined to specify which models were used.

Jalapeño is an ASIC — a chip designed for one narrow purpose. Unlike Nvidia's GPUs, which handle dozens of workload types, Jalapeño is optimized exclusively for serving already-trained language models. That means it will not be useful for model training — OpenAI will continue relying on Nvidia hardware, AWS Trainium, and other partners for that side of the stack.

Inference economics as a strategic driver

For OpenAI, inference is the core operational cost driver. Every ChatGPT response, every API call, every Codex invocation generates compute cost on the inference side. In 2025, OpenAI generated $13.07 billion in revenue, but operational expenses reached $34 billion, with R&D alone — driven by infrastructure — consuming $19.18 billion. The company was paying Microsoft over $10.59 billion per year just for compute infrastructure.

If Jalapeño actually cuts inference costs by approximately 50% (per Bloomberg data), the impact on the bottom line will be direct. Greg Brockman, OpenAI co-founder and president, described the chip's results plainly: "This is a real performance improvement — both in performance per watt and performance per dollar." On CNBC, Brockman and Broadcom CEO Hock Tan confirmed that testing is already underway on GPT-5.3-Codex-Spark in a near-production environment.

Broadcom is responsible for core silicon implementation and networking technology — including Tomahawk networking silicon — while Celestica handles board, rack, and system integration. The existing partnerships with external investors — $30 billion from Nvidia, $50 billion from Amazon — remain intact. Jalapeño does not replace those relationships — it adds a proprietary hardware layer on the inference side.

Market context: Google, Amazon, and the global chip race

Google has used Tensor Processing Units (TPUs) for years, Amazon developed Trainium, and Microsoft launched Maia 100 in late 2023 followed by Maia 200 in January 2026 — the latter already serving GPT-5.2 models in Azure. Meta operates its own MTIA series.

Jalapeño positions OpenAI closer to the operating model of those companies — with its own stack running from chip architecture through compute kernels, memory management, and deployment. In its announcement, OpenAI described this as full control over "the infrastructure underneath" its models: chip architecture, kernels, memory systems, networking, scheduling, and deployment.

Chinese firms are also competing in this race: Alibaba unveiled the Zhenwu M890 chip targeting agentic AI workloads with large context windows, Huawei is preparing the Ascend 950DT, and ByteDance is reportedly in negotiations with Qualcomm to design a custom ASIC. US export restrictions on AI chips to China are indirectly accelerating this pace.

Why this matters

For the first years of its existence, OpenAI was a company dependent on others — on Microsoft for training compute, on Nvidia for GPUs, on AWS for infrastructure. Jalapeño is the first signal that the company is building its own foundation under its economics: control over inference costs, which are directly tied to the price of the end product.

The broader market implication is significant. Until now, only large cloud platforms could afford multi-year chip programs. The fact that OpenAI — despite running operating losses — chose to enter this space with a nine-month sprint suggests that design cycle compression, assisted by AI models, is beginning to change the economics of semi-custom silicon. That may open the door for other AI companies to take similar steps sooner than the industry expected.

What's next

OpenAI plans to deploy Jalapeño across active data centers by the end of 2026, per the official June 24 announcement
The roadmap includes gigawatt-scale data centers with Microsoft and other partners, where Jalapeño is positioned as the central inference layer component
The chip is designed with future LLMs in mind broadly — Broadcom explicitly signals plans to offer Jalapeño to external AI companies as a commercial product