Key Takeaways

- OpenAI and Broadcom developed Jalapeño, a custom inference ASIC, in just nine months
- The chip is reticle-sized (~840mm²) with six HBM modules, targeting LLM inference workloads
- OpenAI claims better performance-per-watt than current hardware but disclosed no benchmarks
OpenAI now has its own silicon. The company partnered with Broadcom to build Jalapeño, a custom inference processor that went from concept to working engineering samples in nine months. That timeline is aggressive for any chip, let alone a reticle-sized ASIC designed for frontier AI workloads.
The announcement marks OpenAI's entry into custom hardware, joining Google, Amazon, and Microsoft in the race to reduce dependence on Nvidia GPUs. OpenAI frames Jalapeño as the first generation of a broader hardware strategy, not a one-off experiment.

What makes Jalapeño different from other AI accelerators?
OpenAI is careful to position Jalapeño as purpose-built for inference, not a repurposed training chip. Most AI accelerators on the market today, including Nvidia's H100 and AMD's Instinct series, are designed primarily for training and then adapted for inference. OpenAI says it designed Jalapeño's architecture around its specific understanding of how large language models behave during inference.
The design targets four bottlenecks that matter at scale: data movement costs, compute-memory balance, networking efficiency, and overall system behavior. OpenAI opted for HBM memory rather than cheaper DRAM alternatives, prioritizing the low latency needed for reasoning and agentic workloads over raw cost savings.
Richard Ho, who leads OpenAI's hardware program, explained the approach: "Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers. We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models."
How big is the chip?
Analysis of the wafer images reveals Jalapeño's compute chiplet measures approximately 25.46mm by 33mm, putting the die size around 840mm². That's nearly the maximum achievable with current EUV lithography systems, where the reticle limit sits at 858mm².

The package contains one large compute chiplet surrounded by six HBM modules, plus a secondary chiplet that likely handles input/output interfaces. Two structural dummy dies fill out the package. The overall layout suggests Broadcom's influence: the wafer shows a highly regular, columnar floorplan with replicated compute regions typical of systolic-array accelerators.
Whether Jalapeño uses a true 2D systolic array, 1D/2D matrix engines, tensor tiles, or something else entirely isn't clear from the images. The architecture details remain undisclosed.
What do we actually know about performance?
Not much. OpenAI and Broadcom claim early testing shows Jalapeño's performance-per-watt is "substantially better than current state-of-the-art hardware." They also say engineering samples are running at target clock speed and power, executing workloads including something called GPT-5.3-Codex-Spark.
No benchmarks, memory configurations, clock speeds, or power figures were disclosed. The companies claim Jalapeño achieves utilization close to theoretical maximum, which would mean high efficiency in both cost and power terms. These claims should be treated skeptically until third-party validation arrives.
Even if Jalapeño outperforms today's AMD Instinct MI350-series and Nvidia Blackwell accelerators, the competitive picture will shift. AMD's MI400-series and Nvidia's Rubin architecture are both on the roadmap. OpenAI is building for a moving target.
Why the nine-month timeline matters
A reticle-sized ASIC typically takes 18 to 24 months or longer from design start to working silicon. Hitting nine months suggests either an extremely focused design scope, heavy reuse of existing Broadcom IP, or both. Broadcom has deep experience building custom accelerators for hyperscalers, and that infrastructure likely accelerated Jalapeño's development.
The speed also signals urgency. OpenAI's inference costs are enormous, and every percentage point of efficiency improvement translates to significant savings at their scale. Custom silicon is a long-term hedge against both GPU pricing and supply constraints.
Logicity's Take
OpenAI joining the custom silicon club was inevitable, but the speed and ambition are notable. Nine months for a reticle-sized ASIC suggests Broadcom did the heavy lifting on physical design while OpenAI specified the architecture. For AI builders watching this space, the strategic takeaway is clear: inference costs are the new battleground. Training gets the headlines, but serving billions of API requests daily is where the economics get brutal. If Jalapeño delivers on efficiency claims, expect OpenAI to either lower API prices or improve margins substantially. Either way, smaller teams running their own inference on cloud GPUs should watch what happens to ChatGPT's pricing over the next 12 to 18 months.
Frequently Asked Questions
When will OpenAI's Jalapeño chip be available?
OpenAI has not announced a production timeline. Engineering samples are running in labs, but deployment at scale typically takes another 6 to 12 months after this stage.
Will Jalapeño replace Nvidia GPUs for OpenAI?
Jalapeño is designed for inference only. OpenAI will still need Nvidia or AMD hardware for training. The chip is meant to reduce inference costs, not eliminate GPU dependence entirely.
How does Jalapeño compare to Google's TPUs?
Both are custom inference accelerators, but direct comparisons aren't possible without benchmarks. Google's TPUs are designed for their own workloads, while Jalapeño targets OpenAI's specific LLM behavior patterns.
Who manufactured the Jalapeño chip?
The announcement doesn't specify the foundry, but reticle-sized ASICs using advanced EUV lithography are typically fabricated by TSMC.
Will OpenAI sell Jalapeño chips to other companies?
There's no indication OpenAI plans to sell the hardware. The chip appears designed exclusively for internal use to reduce OpenAI's own inference costs.
Previous coverage of Jalapeño's efficiency claims
Need Help Implementing This?
Building AI products on OpenAI's APIs and wondering how infrastructure changes might affect your costs? Logicity helps product teams navigate the evolving AI landscape. Get in touch for a consultation on your AI strategy.
Source: Latest from Tom's Hardware
Huma Shazia
Senior AI & Tech Writer
Produced with AI assistance and reviewed by the Logicity editorial team. Learn more in our Editorial Policy.
Related Articles
Browse all
ChatGPT Images 2.0 Handles Hindi Text and Code Prompts
OpenAI's new image model was stress-tested with 10 demanding prompts, including Hindi billboard text, Python code rendering, and complex product packaging. The results show major improvements in text accuracy and character consistency over previous DALL-E models.

10 Ways to Use OpenAI Codex for Real Work Tasks
OpenAI Academy published a practical guide showing how Codex can automate daily briefings, weekly summaries, and workflow tasks by pulling context from calendars, email, and messaging apps. The guide includes ready-to-use prompts and customization tips.




