All posts

OpenAI's first custom chip: Jalapeño targets inference costs

Manaal KhanJune 27, 2026 at 7:17 PM5 min read
OpenAI's first custom chip: Jalapeño targets inference costs

Key Takeaways

OpenAI's first custom chip: Jalapeño targets inference costs
Source: The Decoder
  • OpenAI's Jalapeño chip is purpose-built for LLM inference, not a modified general-purpose processor
  • Microsoft has reportedly agreed to purchase 40% of initial chip production
  • The 9-month development cycle is unusually fast for custom silicon, with OpenAI crediting its own AI models for accelerating design

OpenAI has entered the custom silicon business. The company unveiled Jalapeño, an inference-focused chip developed with Broadcom, marking its first piece of proprietary hardware after years of running on NVIDIA GPUs. Large-scale deployment is planned for late 2026, with Microsoft reportedly committing to buy 40% of initial production.

The chip is designed specifically for running large language models in production, not for training them. That distinction matters. Inference accounts for the bulk of operational costs when AI models serve millions of users. Every percentage point of efficiency translates directly into margin.

Why is OpenAI building its own chip now?

OpenAI has been paying the NVIDIA tax since its founding. GPUs remain the default hardware for AI workloads, but they are general-purpose processors optimized for graphics workloads first, AI second. Google figured this out years ago and built TPUs. Amazon has Trainium and Inferentia. Meta is developing MTIA. OpenAI was the conspicuous holdout among major AI labs.

The economics are straightforward. Custom silicon designed for a specific workload eliminates transistors spent on features you do not need. OpenAI says Jalapeño was designed from scratch for modern LLM inference, not adapted from an existing architecture. The company claims the chip cuts data movement and pushes utilization closer to theoretical maximums.

Broadcom CEO Hock Tan and President Charlie Kawwas delivered the first wafer to Sam Altman and Greg Brockman in a ceremonial handoff. Broadcom handles silicon manufacturing and contributes its Tomahawk networking chips. Celestica manages boards, racks, and system integration. OpenAI owns the chip design itself.

Performance claims remain unverified

OpenAI says early tests show performance per watt that is "substantially better" than current hardware. Those are self-reported numbers. No independent benchmarks exist. The company has not disclosed which chips Jalapeño was tested against, what tasks were measured, or the test conditions. A technical report is promised but not yet published.

Engineering samples are already running ML workloads in OpenAI's lab, including the GPT-5.3-Codex-Spark model. That model currently runs on Cerebras hardware, which also specializes in inference. The comparison between Jalapeño and Cerebras will be worth watching once real data emerges.

9 months
Time from design to tape-out, which OpenAI calls the fastest ASIC development cycle for high-performance semiconductors it knows of

The nine-month timeline stands out. Custom chip development typically takes two to three years. OpenAI credits its own AI models with accelerating parts of the design process. If true, this represents one of the more concrete examples of AI speeding up hardware development. Rumors about OpenAI's chip ambitions have circulated since 2023, so the company had time to prepare before formal development began.

Microsoft's 40% commitment signals confidence

Broadcom reportedly required Microsoft to guarantee purchase of 40% of initial chip production before proceeding with manufacturing at scale. That is not unusual. Chip fabrication requires massive capital investment, and foundries want committed buyers before spinning up production lines.

Microsoft runs Azure OpenAI Service, which provides API access to OpenAI models for enterprise customers. The company has every incentive to reduce inference costs. At current scale, even small efficiency gains compound into significant savings. Microsoft also has its own AI workloads to serve, including Copilot features embedded across Office, Windows, and GitHub.

The planned deployment operates at "gigawatt scale," according to Broadcom's Tan. For context, a single large data center typically consumes 50 to 100 megawatts. Gigawatt-scale deployment suggests infrastructure spread across multiple facilities, likely co-located with Microsoft Azure data centers.

What does this mean for NVIDIA?

NVIDIA remains dominant in AI hardware. The company's H100 and upcoming Blackwell chips power most large-scale AI training and inference today. But the trend is clear: major AI labs are building alternatives for inference workloads where they control both the model architecture and the deployment infrastructure.

Training workloads are harder to optimize with custom silicon because requirements shift as researchers experiment with new architectures. Inference is more predictable. Once a model is trained, the computational patterns are fixed. That predictability makes inference an easier target for hardware specialization.

OpenAI will likely continue using NVIDIA GPUs for training while shifting inference to Jalapeño where possible. The company describes this as a multi-generation platform, suggesting Jalapeño is the first in a series of custom chips rather than a one-off project.

Also Read
Big Tech funds $1B worker retraining as it automates jobs

The infrastructure buildout behind chips like Jalapeño is reshaping tech labor markets

The full-stack argument

OpenAI frames custom hardware as part of controlling the full stack from chip to product. The logic: when you design the model, the inference runtime, and the silicon, you can co-optimize in ways that are impossible when buying commodity hardware. Apple made this argument with its M-series chips. Google made it with TPUs.

The risk is distraction. Chip development is expensive, time-consuming, and requires expertise far removed from machine learning research. OpenAI is betting that the cost savings and performance gains justify building an entirely new competency. Given the scale of OpenAI's inference workloads, the math probably works. Smaller AI labs will continue relying on NVIDIA and cloud providers.

Frequently Asked Questions

When will OpenAI's Jalapeño chip be available?

Large-scale deployment is planned for late 2026, primarily through Microsoft Azure and other partners. OpenAI has not announced plans to sell the chip directly.

How does Jalapeño compare to NVIDIA GPUs?

OpenAI claims substantially better performance per watt, but no independent benchmarks exist yet. The chip is optimized specifically for LLM inference rather than general-purpose AI workloads.

Will Jalapeño replace NVIDIA chips at OpenAI?

Partially. OpenAI will likely continue using NVIDIA GPUs for training while shifting inference workloads to Jalapeño where the specialized hardware provides advantages.

What role does Broadcom play in the partnership?

Broadcom handles silicon manufacturing and provides networking technology including Tomahawk chips. OpenAI owns the chip design itself.

Can other companies buy Jalapeño chips?

Microsoft is guaranteed 40% of initial production. OpenAI mentions "other partners" but has not disclosed whether chips will be available for purchase outside strategic partnerships.

ℹ️

Logicity's Take

For AI builders evaluating inference infrastructure, Jalapeño signals that the hyperscalers are serious about breaking NVIDIA's grip on inference economics. But this does not change your options today. If you are building products on OpenAI's API, you may eventually benefit from lower costs as OpenAI's infrastructure expenses drop. If you self-host models, the competitive pressure from custom silicon projects (Google TPUs, AWS Inferentia 2, now Jalapeño) should push NVIDIA to improve price-performance on its next-gen chips. The practical move: benchmark your inference costs now so you can measure whether promised savings materialize when these chips actually deploy.

ℹ️

Need Help Implementing This?

Logicity helps AI teams evaluate inference infrastructure, benchmark model serving costs, and navigate the evolving landscape of AI hardware options. Contact us to discuss your deployment strategy.

Source: The Decoder / Maximilian Schreiner

M

Manaal Khan

Tech & Innovation Writer

Produced with AI assistance and reviewed by the Logicity editorial team. Learn more in our Editorial Policy.