AMD vs. Nvidia: The Battle for AI GPU Supremacy Begins

AMD’s new H100 rival, the MI300X, can run a 40 billion parameter model on one chip

Ben Wodecki, Jr. Editor

June 21, 2023

5 Min Read
AMD

At a Glance

  • AMD has unveiled the MI300X, its challenge to Nvidia's flagship GPU, the H100.
  • The MI300X can run a 40 billion parameter model on one chip.

Nvidia has long been the undisputed leader of AI GPUs. Now, AMD is poised to grab a piece of the market as Nvidia GPUs remain in short supply.

AMD has unveiled the MI300X, a rival to Nvidia’s new flagship semiconductor, the H100.

The MI300X is specifically designed for generative AI workloads, able to power large language models like the Falcon-40 on a single chip, or what AMD calls “accelerators." The aptly named Falcon-40 boasts 40 billion parameters, some 150 billion less than OpenAI’s now outdated GPT-3 model.

The company is also looking to offer an OCP-compliant board called Instinct Platform that will comprise eight MI300Xs, not too dissimilar to Nvidia’s HGX offering.

The hardware is based on AMD’s CDNA 3 architecture and supports up to 192 GB of HBM3 memory. No real benchmarks or prices have been disclosed, however, although the company has been keen to stress that its new offering will give customers “a cloud, to edge, to endpoint portfolio of hardware products, with deep industry software collaboration, to develop scalable and pervasive AI solutions.”

Alexander Harrowell, principal analyst for advanced computing at sister research firm Omdia, said the chips are somewhat similar, but what defines the MI300X compared to an MI300A is its bigger on-chip memory cache.

He said "192GB (for the MI300X) – compared with 128GB on the MI300A and 120GB on the H100 – and it's a pure GPU, while the MI300A is actually more like the Nvidia Grace Hopper, a multichip module containing both a CPU and a GPU.”

“This seems to be where we’re going as an industry – AMD is doing it, Nvidia is doing it, Intel was doing it but that’s been put off. Both parts are screamingly powerful (thousands of TOPS) and also very hot – Nvidia’s is 700W TDP and AMD beat even that at 750W.”

H100’s head start as GPUs are hot stuff

Among the key differences between the H100 and AMD’s new MI300X is that the former has a head start. Nvidia revealed the H100 last September and officially began shipping DGX H100 systems to customers in early May.

Customers have had far more time to get familiar with Nvidia’s flagship chip, and with its pre-established dominant position in AI GPUs, it’s tough to see the MI300X coming in and shaking things up straight away.

That said, everyone is after GPUs right now. The explosion of interest in generative AI following ChatGPT’s success has led to companies everywhere trying to invest in AI – and it’s making it harder to get chips.

The likes of OpenAI, Microsoft, Adobe and even Twitter are rushing to snap up GPUs left and right to train large foundation models. These large-scale players have the resources to buy huge swathes of chips, with H100s costing around $30,000 each.

Take ByteDance, the parent company of TikTok, which has spent around $1 billion on Nvidia GPUs this year, split across A100s and the Chinese-market-specific A800s (essentially, a debuffed H100 due to U.S. export rules), according to reports in Chinese publication Jitwei.

GPUs are in scarce supply such that investors have turned to buying them for startups to help accelerate their work.

Former GitHub CEO Nat Friedman and serial investor Daniel Gross snapped up thousands of GPUs to form the Andromeda Cluster. This mammoth computing system contains 2,512 H100s that startups can use to train a 65-billion parameter version of the LLaMA AI model in just 10 days.

Could AMD challenge the market leader with its MI300X? For now, AMD said it is only letting "key customers" sample the chips beginning in the third quarter. AMD has begun letting customers sample its MI300A, which can power HPC and AI workloads, but those wanting the MI300X, may be forced to wait – meanwhile, Nvidia has begun shipping its H100s.

Stay updated. Subscribe to the AI Business newsletter

Harrowell said the issue isn’t so much the hardware, but whether AMD can persuade developers to adopt its software.

“Nvidia’s SDK, CUDA, is the industry standard and they have built a huge range of tools on top of that for training large models, inference-serving, optimization, transfer learning, infrastructure management and operations automation, and a lot of vertical-specific applications.”

"By contrast, AMD's equivalent, ROCm, has struggled. There have been some improvements, but it was telling how little AMD had to say about it on the ‘premiere’ event they held to announce the new chip.”

The Omdia analyst said his team has been expecting GPU growth to pick up and for AMD, it "helps to get in the game now” even if its product “won’t ramp until the end of this year and early next year.”

“There are likely to be delays for everyone, but Nvidia got in first − a big advantage. AMD also makes its market-leading Epyc server CPUs, which are absolutely critical to their business, using the same packaging process, so they will have to think hard about how many wafers to dedicate to the MI300s, which are a risk, and how many CPUs, which they know they can sell."

AMD eyes the emerald isle

AMD also announced plans to invest $135 million over four years in facilities in Ireland. The funds will cover R&D projects as well as create 300 engineering and research jobs. It said the research projects will cover “next-generation AI” as well as data centers and 6G.

AMD’s investment is being supported by the Irish government through IDA Ireland, a consultancy group that tries to encourage investment in the country by foreign-owned companies. The investment will “bolster” the country’s tech sector and create long-term career opportunities, said Simon Coveney, Ireland’s Minister for Enterprise, Trade and Employment.

AMD’s presence in Ireland stems from its 2022 acquisition of Xilinx. The country is now home to one of the largest AMD R&D sites in Europe and has delivered products with “significant” commercial success such as the AMD Zynq UltraScale+ RFSoC semiconductor product family.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like