AMD Launches MI350P: A 144GB PCIe Card Challenging Nvidia

Key Takeaways

- The MI350P runs on CDNA4 architecture with 144GB HBM3E and 4TB/s bandwidth in a standard PCIe dual-slot form factor
- AMD claims roughly 40% faster FP16 and FP8 theoretical compute versus Nvidia's H200 NVL
- The card supports 450W or 600W power configurations for different server environments
AMD has unveiled the Instinct MI350P, a PCIe AI accelerator that brings the company's latest CDNA4 architecture to standard rack-mounted servers. The card packs 144GB of HBM3E memory and aims to give data centers a straightforward upgrade path for AI inference workloads.
The MI350P is designed as a drop-in solution for existing air-cooled servers. It fits into a 10.5-inch dual-slot form factor with a fanless design that relies on chassis airflow. AMD rates it for 600W but includes a 450W configuration for power-constrained environments.
What's Inside the MI350P
The MI350P runs on AMD's CDNA4 architecture, built on TSMC's 3nm and 6nm FinFET processes. It features 8,192 cores, 128 compute units, and 512 Matrix Cores with a maximum clock speed of 2.2GHz. The GPU pairs with 144GB of HBM3E memory offering 4TB/s of bandwidth and includes a 128MB last-level cache.
These specs are exactly half of what AMD's flagship MI350X and MI355X OAM accelerators offer. The tradeoff is clear: you get a card that fits into standard PCIe infrastructure instead of requiring specialized OAM platforms.

Performance Claims and Target Workloads
AMD claims the MI350P delivers roughly 40% faster FP16 and FP8 theoretical compute compared to Nvidia's H200 NVL, the current top PCIe AI accelerator. The company estimates 2,299 TFLOPs of performance with standard precision and 4,600 peak TFLOPs using MXFP4.
The card supports lower-precision MXFP6 and MXFP4 formats natively, matching the capabilities of the higher-end MI350X and MI355X. These formats accelerate large language model inference, where reduced precision can speed up computations without meaningfully affecting output quality.
AMD is targeting inference and retrieval-augmented generation (RAG) pipelines with this card. Up to eight MI350P cards can work together in a single system, letting data centers scale performance incrementally.
How It Stacks Up: MI350 Family Specs
| Spec | MI350P PCIe | MI325X OAM | MI350X OAM | MI355X OAM |
|---|---|---|---|---|
| Architecture | CDNA 4 | CDNA 3 | CDNA 4 | CDNA 4 |
| Memory | 144GB HBM3E | 256GB HBM3E | 288GB HBM3E | 288GB HBM3E |
| Memory Bandwidth | 4 TB/s | 6 TB/s | 8 TB/s | 8 TB/s |
| FP64 Performance | 36 TFLOPs | — | 72 TFLOPs | 78.6 TFLOPs |
| FP16 Performance | 2.3 PFLOPs | 2.61 PFLOPs | 4.6 PFLOPs | 5 PFLOPs |
| FP8 Performance | 4.6 PFLOPs | 5.22 PFLOPs | 9.2 PFLOPs | 10.1 PFLOPs |
| FP4 Performance | — | — | 18.45 PFLOPs | 20.1 PFLOPs |
Why PCIe Still Matters
The MI350P fills a gap in AMD's lineup. While the MI350X and MI355X target purpose-built AI infrastructure with OAM form factors, many organizations run AI workloads on existing server hardware. A PCIe card that slots into standard infrastructure reduces the barrier to adoption.
Nvidia's H200 NVL currently dominates this segment. If AMD's 40% performance claims hold up in real-world testing, the MI350P could give enterprises a compelling alternative, particularly those already running AMD systems or looking to diversify their GPU suppliers.
How U.S. policy shifts could affect AI hardware deployment
What We Don't Know Yet
AMD has not announced pricing or availability for the MI350P. The 40% performance advantage is a theoretical compute claim, which rarely translates directly to real-world workloads. Independent benchmarks comparing the MI350P to the H200 NVL on actual inference tasks will be crucial.
Software maturity is another consideration. Nvidia's CUDA ecosystem has years of optimization and broad framework support. AMD's ROCm platform has improved but still trails in some areas. Enterprises will weigh hardware performance against the total cost of software migration.
Logicity's Take
Frequently Asked Questions
How much memory does the AMD MI350P have?
The MI350P comes with 144GB of HBM3E memory offering 4TB/s of bandwidth.
Is the MI350P faster than Nvidia's H200 NVL?
AMD claims roughly 40% faster FP16 and FP8 theoretical compute compared to the H200 NVL. Real-world performance will depend on specific workloads.
What power does the MI350P require?
The card can be configured for either 600W or 450W operation, depending on server thermal and power constraints.
How many MI350P cards can run in one system?
Up to eight MI350P cards can be paired together in a single system for scaled performance.
What architecture does the MI350P use?
The MI350P runs on AMD's CDNA4 architecture, built on TSMC's 3nm and 6nm FinFET processes.
Need Help Implementing This?
Source: Latest from Tom's Hardware
Huma Shazia
Senior AI & Tech Writer
Related Articles
Browse all
Alienware AW2726DM Review: The $350 QD-OLED Gaming Monitor That Changes Everything
Dell's Alienware AW2726DM shatters the OLED gaming monitor price barrier at just $350, delivering 27-inch QHD resolution, 240Hz refresh rate, and Quantum Dot color that rivals monitors costing twice as much. This isn't an incremental price drop. It's a complete reset of what budget-conscious gamers can expect.

iPhone Fold Launch 2026: Apple's First Foldable Could Capture 19% Market Share Instantly
Apple's long-awaited foldable iPhone is finally coming, and analysts predict it'll rocket the company to third place in the foldable market behind Samsung and Huawei. The secret weapon? Some seriously clever material science that could solve the crease problem that's plagued every foldable phone so far.

FAA Approves Military Laser Weapons for Drone Defense: What the New Airspace Rules Mean for Border Security
The FAA has given the Pentagon full approval to use high-energy laser systems against drones in US airspace, ending a two-month standoff that started when lasers shot down party balloons mistaken for cartel drones. The decision comes after safety assessments concluded these weapons don't pose increased risk to civilian aircraft.

China Chip Subsidies Reach $142 Billion: 3.6x More Than US Spent on Semiconductor Manufacturing
A new CSIS report reveals China has poured $142 billion into semiconductor subsidies over the past decade, dwarfing US spending by a factor of 3.6. But here's the twist: despite this massive investment, Chinese chipmakers still lag years behind TSMC and struggle with abysmal yields at advanced nodes.
Also Read
FlorisBoard: An Open-Source Gboard Alternative Worth Trying
FlorisBoard offers Android users a privacy-focused keyboard that stores all data locally. The open-source app trades some features like predictive text for complete transparency about what happens to your keystrokes.

DOJ Warns Dealmakers: AI Disruption Claims Need Evidence
The US Justice Department's antitrust chief told companies they cannot use vague claims about AI disrupting their industries to justify mergers. Acting Assistant Attorney General Omeed Assefi said regulators know when dealmakers are trying to mislead them and will require actual evidence for any AI-related defense.

OpenAI Expands ChatGPT Ads to Five More Countries
OpenAI is rolling out ads in ChatGPT to the UK, Mexico, Brazil, Japan, and South Korea following pilots in the US, Canada, Australia, and New Zealand. The company says early results show no impact on consumer trust metrics, and paid subscribers remain ad-free.