AMD Instinct™ GPUs Continue AI Momentum Across Industry Benchmarks and Today’s Most Demanding AI Models

April 2, 2025 by Ronak Shah

Customers evaluating AI infrastructure today rely on a combination of industry-standard benchmarks and real-world model performance metrics—such as those from Llama 3.1 405B, DeepSeek-R1, and other leading open-source models—to guide their GPU purchase decisions.

At AMD, we believe that delivering value across both dimensions is essential to driving broader AI adoption and real-world deployment at scale. That’s why we take a holistic approach— optimizing performance for rigorous industry benchmarks like MLperf while also enabling Day 0 support and rapid tuning for the models most widely used in production by our customers. This strategy helps ensure AMD Instinct™ GPUs deliver not only strong, standardized performance, but also high-throughput, scalable AI inferencing across the latest generative and language models used by customers.

In this blog, we explore how AMD’s continued investment in benchmarking, open model enablement, software and ecosystem tools helps unlock greater value for customers—from MLPerf Inference 5.0 results to Llama 3.1 405B and DeepSeek-R1 performance, ROCm software advances, and beyond.

Series of Firsts for AMD Instinct in MLPerf Inference 5.0

In the MLPerf Inference 5.0 round, AMD marked a milestone with a series of significant firsts highlighting our growing momentum in this key industry standard benchmark.

  • We submitted our first ever MLPerf inference numbers for AMD Instinct MI325X, our latest generation of Instinct GPU launched in Oct 2024.
  • We supported the first-ever multi-node submission using the AMD Instinct solution in collaboration with a partner.
  • For the first time, enabled multiple partners to submit results using our latest MI325X GPUs.

Growing Industry Adoption and Broadening Our Presence

We are proud that multiple partnersSupermicro (SMC), ASUS, and Gigabyte (GCT) with Instinct MI325X, and MangoBoost with Instinct MI300X—have successfully submitted MLPerf results using AMD Instinct GPUs for the first time.

AMD 20250402 ima01

All the partner submissions with Instinct MI325X on Llama 2 70B achieved comparable performance with AMD submitted results (Figure 1), underscoring the consistency and reliability of our GPUs across diverse environments.

In addition to Llama 2 70B, AMD has extended its submissions to include Stable Diffusion XL (SDXL) with the latest Instinct MI325X GPUs, demonstrating competitive performance in generative AI workloads (see figure 1). Our unique GPU partitioning techniques played a pivotal role in achieving competitive performance vs NVIDIA H200 in our inaugural SDXL submission.

AMD 20250402 ima02

Figure 1: AMD (1× Instinct MI325X node, MLPerf 5.0) vs NVIDIA (1x H200 node) Submission

Results for Llama 2 70B and SDXL Benchmark

Beyond MLPerf, AMD continues to help customers confidently deploy the most advanced AI models at scale. We recently delivered Day 0 support for Google’s Gemma 3 models, helping enable early access to high-performance inference on AMD Instinct GPUs. Our ongoing work with Llama 3.1 405B and DeepSeek-R1 also delivered leadership performance through rapid ROCm software led advancements. We’ll dive deeper into these performance highlights for later in the blog—so keep reading!

Proving Scalability: A Record-Breaking Multi-Node Submission

MangoBoost, a provider of advanced system solutions maximizing AI data center efficiency, made the first-ever partner submission to MLperf utilizing multiple nodes of AMD Instinct solutions, specifically with four nodes of Instinct MI300X. Notably, this submission set a new benchmark, achieving the highest-ever offline performance recorded in MLPerf submissions for the Llama 2 70B benchmark (see Figure 2). This submission validates the scalability and performance of AMD Instinct solutions in multi-node AI workloads.

AMD 20250402 ima03

Figure 2: MangoBoost (4× Instinct MI300X nodes, MLPerf 5.0) vs. AMD (1× Instinct MI300X node, MLPerf 4.1) Submission Results for Llama 2 70B Benchmark

MLPerf Performance Insights

At the core of strong AMD MLPerf Inference 5.0 results is the synergy between Instinct MI325X hardware and ROCm™-driven software innovation.

Each MI325X node offers 2.048 TB of HBM3e memory and 6 TB/s bandwidth, enabling models like Llama 2 70B and SDXL to run entirely in memory even on a single GPU—including KV cache—avoiding cross-GPU overhead and maximizing throughput.

The latest AMD bi-weekly ROCm containers, available via Infinity Hub, brought key optimizations in kernel scheduling, GEMM tuning, and inference efficiency, helping unlock the full potential of the MI325X. Additionally, the AMD Quark tool enabled FP16-to-FP8 quantization, while improvements to vLLM and memory handling further boosted inference performance.

The latest updates across the ROCm ecosystem are poised to further enhance AMD’s MLPerf future performance and help Instinct customers scale AI workloads more efficiently. The new AI Tensor Engine for ROCm (AITER) accelerates critical operations like GEMM, Attention, and Mixture-of-Experts using drop-in, pre-optimized kernels—delivering up to 17× faster decoder execution, 14× improvements in Multi-Head Attention, and over 2× throughput in LLM inference. Read more about AITER here.

AMD also recently introduced Open Performance and Efficiency Architecture (OPEA)—a crossplatform framework offering deep telemetry across compute, memory, and power. Integrated with ROCm and compatible with PyTorch, Triton, and multi-GPU setups, OPEA helps Instinct customers optimize performance and scale from edge to cloud. Learn more about OPEA here.

In addition, the AMD GPU Operator simplifies Kubernetes-native deployment of AMD GPUs for production AI environments. Recent updates include enhanced automation, multi-instance GPU (MIG) support, and deeper ROCm integration—reducing operational overhead and accelerating time-to-value for Instinct users. Explore our AI Inference Orchestration with Kubernetes on Instinct blog series here: Part 1, Part 2, Part 3.

Together, these enhancements will continue to help AMD deliver strong results in MLPerf submissions while providing even greater value and scalability for Instinct customers.

Maintaining Strong Performance on The Most Advanced and Latest Open-Source Models Available Today

Building on our MLPerf success, AMD continues to deliver exceptional performance on leading open-source AI models, notably DeepSeek-R1 and Llama 3.1 405B.

Optimized for AMD Instinct™ MI300X GPUs, DeepSeek-R1 benefits from rapid ROCm™ optimizations, achieving a 4X inference speed boost in just 14 days. While MI300X competes directly with NVIDIA’s H100, its performance rivals the H200 (see figure 3), making it an excellent choice for scalability, high throughput, and efficiency. Read more on how to reproduce this benchmark here.

AMD 20250402 ima04

Figure 3: AMD (1× Instinct MI300X node) vs. NVIDIA (1× H200 node) Performance Results for Deepseek R1 Benchmark

The Llama 3.1 405B model has been optimized for AMD Instinct™ MI300X GPUs, helping AMD become the exclusive inferencing solution for Meta’s frontier model because of its performance leadership. MI300X outperforms NVIDIA’s H100 in memory-bound workloads due to its higher bandwidth, while also reducing infrastructure costs by requiring fewer nodes for large models. With Day 0 support, AMD helped ensure seamless deployment and optimization of this cuttingedge model from the start. Read more on how to reproduce this benchmark here.

AMD 20250402 ima05

Figure 4: (1× Instinct MI300X node) vs (1x NVIDIA H100 node) Llama 3.1 405B FP8 throughput vs latency with TP4 & TP8

Continued Momentum and Commitment to Transparency

AMD’s investment in AI scalability, performance, software advancements and open-source strategy are evident in our MLPerf v5.0 results, industry collaborations, and optimizations for cutting-edge models like DeepSeek-R1 and Llama 3.1 405B. With MI300X and MI325X, we deliver scalable, high-performance AI solutions that drive efficiency and cost-effectiveness.

As we push AI forward, AMD remains dedicated to transparency, innovation, and empowering customers to scale AI with confidence. Stay tuned for our next MLPerf submission—we look forward to sharing our progress and insights with you.

AMD remains committed to open source and transparency. All results can be reproduced by following the instructions in our ROCm blog post, with full submission results available on the MLCommons website, and source artifacts available in this repository.

Key Contributors: Meena Arunachalam, Miro Hodak, Mahesh Balasubramanian, David

Szabados, Aaron Grabein