AMD Launches MI350P: A 144GB PCIe Card Challenging Nvidia

Key Takeaways

- The MI350P runs on CDNA4 architecture with 144GB HBM3E and 4TB/s bandwidth in a standard PCIe dual-slot form factor
- AMD claims roughly 40% faster FP16 and FP8 theoretical compute versus Nvidia's H200 NVL
- The card supports 450W or 600W power configurations for different server environments
AMD has unveiled the Instinct MI350P, a PCIe AI accelerator that brings the company's latest CDNA4 architecture to standard rack-mounted servers. The card packs 144GB of HBM3E memory and aims to give data centers a straightforward upgrade path for AI inference workloads.
The MI350P is designed as a drop-in solution for existing air-cooled servers. It fits into a 10.5-inch dual-slot form factor with a fanless design that relies on chassis airflow. AMD rates it for 600W but includes a 450W configuration for power-constrained environments.
What's Inside the MI350P
The MI350P runs on AMD's CDNA4 architecture, built on TSMC's 3nm and 6nm FinFET processes. It features 8,192 cores, 128 compute units, and 512 Matrix Cores with a maximum clock speed of 2.2GHz. The GPU pairs with 144GB of HBM3E memory offering 4TB/s of bandwidth and includes a 128MB last-level cache.
These specs are exactly half of what AMD's flagship MI350X and MI355X OAM accelerators offer. The tradeoff is clear: you get a card that fits into standard PCIe infrastructure instead of requiring specialized OAM platforms.

Performance Claims and Target Workloads
AMD claims the MI350P delivers roughly 40% faster FP16 and FP8 theoretical compute compared to Nvidia's H200 NVL, the current top PCIe AI accelerator. The company estimates 2,299 TFLOPs of performance with standard precision and 4,600 peak TFLOPs using MXFP4.
The card supports lower-precision MXFP6 and MXFP4 formats natively, matching the capabilities of the higher-end MI350X and MI355X. These formats accelerate large language model inference, where reduced precision can speed up computations without meaningfully affecting output quality.
AMD is targeting inference and retrieval-augmented generation (RAG) pipelines with this card. Up to eight MI350P cards can work together in a single system, letting data centers scale performance incrementally.
How It Stacks Up: MI350 Family Specs
| Spec | MI350P PCIe | MI325X OAM | MI350X OAM | MI355X OAM |
|---|---|---|---|---|
| Architecture | CDNA 4 | CDNA 3 | CDNA 4 | CDNA 4 |
| Memory | 144GB HBM3E | 256GB HBM3E | 288GB HBM3E | 288GB HBM3E |
| Memory Bandwidth | 4 TB/s | 6 TB/s | 8 TB/s | 8 TB/s |
| FP64 Performance | 36 TFLOPs | — | 72 TFLOPs | 78.6 TFLOPs |
| FP16 Performance | 2.3 PFLOPs | 2.61 PFLOPs | 4.6 PFLOPs | 5 PFLOPs |
| FP8 Performance | 4.6 PFLOPs | 5.22 PFLOPs | 9.2 PFLOPs | 10.1 PFLOPs |
| FP4 Performance | — | — | 18.45 PFLOPs | 20.1 PFLOPs |
Why PCIe Still Matters
The MI350P fills a gap in AMD's lineup. While the MI350X and MI355X target purpose-built AI infrastructure with OAM form factors, many organizations run AI workloads on existing server hardware. A PCIe card that slots into standard infrastructure reduces the barrier to adoption.
Nvidia's H200 NVL currently dominates this segment. If AMD's 40% performance claims hold up in real-world testing, the MI350P could give enterprises a compelling alternative, particularly those already running AMD systems or looking to diversify their GPU suppliers.
How U.S. policy shifts could affect AI hardware deployment
What We Don't Know Yet
AMD has not announced pricing or availability for the MI350P. The 40% performance advantage is a theoretical compute claim, which rarely translates directly to real-world workloads. Independent benchmarks comparing the MI350P to the H200 NVL on actual inference tasks will be crucial.
Software maturity is another consideration. Nvidia's CUDA ecosystem has years of optimization and broad framework support. AMD's ROCm platform has improved but still trails in some areas. Enterprises will weigh hardware performance against the total cost of software migration.
Logicity's Take
Frequently Asked Questions
How much memory does the AMD MI350P have?
The MI350P comes with 144GB of HBM3E memory offering 4TB/s of bandwidth.
Is the MI350P faster than Nvidia's H200 NVL?
AMD claims roughly 40% faster FP16 and FP8 theoretical compute compared to the H200 NVL. Real-world performance will depend on specific workloads.
What power does the MI350P require?
The card can be configured for either 600W or 450W operation, depending on server thermal and power constraints.
How many MI350P cards can run in one system?
Up to eight MI350P cards can be paired together in a single system for scaled performance.
What architecture does the MI350P use?
The MI350P runs on AMD's CDNA4 architecture, built on TSMC's 3nm and 6nm FinFET processes.
Need Help Implementing This?
Source: Latest from Tom's Hardware
Huma Shazia
Senior AI & Tech Writer
اقرأ أيضاً

رأي مغاير: كيف يؤثر اختراق الأمن الداخلي الأميركي على شركاتنا الخاصة؟
في ظل اختراق عقود الأمن الداخلي الأميركي مع شركات خاصة، نناقش تأثير هذا الاختراق على مستقبل الأمن السيبراني. نستعرض الإحصاءات الموثوقة ونناقش كيف يمكن للشركات الخاصة أن تتعامل مع هذا التهديد. استمتع بقراءة هذا التحليل العميق

الإنسان في زمن ما بعد الوجود البشري: نحو نظام للتعايش بين الإنسان والروبوت - Centre for Arab Unity Studies
في هذا المقال، سنناقش كيف يمكن للبشر والروبوتات التعايش في نظام متكامل. سنستعرض التحديات والحلول المحتملة التي تضعها شركات مثل جوجل وأمازون. كما سنلقي نظرة على التوقعات المستقبلية وفقًا لتقرير ماكنزي

إطلاق ناسا لمهمة مأهولة إلى القمر: خطوة تاريخية نحو استكشاف الفضاء
تعتبر المهمة الجديدة خطوة هامة نحو استكشاف الفضاء وتطوير التكنولوجيا. سوف تشمل المهمة إرسال رواد فضاء إلى سطح القمر لconducting تجارب علمية. ستسهم هذه المهمة في تطوير فهمنا للفضاء وتحسين التكنولوجيا المستخدمة في استكشاف الفضاء.