AI Business is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.

IT & Data Center

The 5 petaflops Nvidia DGX A100 hopes to run your AI workloads

by Sebastian Moss
Article Image

While HGX will do it for the cloud

What do you get if you take eight of Nvidia’s new A100 GPUs, a dual 64-core AMD Rome CPU, six NVSwitches, 15TB of Gen 4 NVME SSD, nine Mellanox 200Gbps Network interfaces, and package them all together?

Well, a bill for $199,000 - but also a lot of AI performance. Nvidia’s latest DGX reference architecture is the company’s preferred approach to shipping its highest performance chips.

The DGX A100, as the most recent iteration is named, is capable of five petaflops of FP16 performance, or 2.5 petaflops TF32, and 156 teraflops FP64. It also runs at 10 petaops (not flops) with INT8.

AI ready

“Nvidia DGX A100 is the ultimate instrument for advancing AI,” Jensen Huang, the ebullient company CEO, said as he unveiled the product during Nvidia’s now-virtual GTC.

“Nvidia DGX is the first AI system built for the end-to-end machine learning workflow - from data analytics to training to inference. And with the giant performance leap of the new DGX, machine learning engineers can stay ahead of the exponentially growing size of AI models and data.”

Among the first customers of the DGX, which has 320GB of memory for training large AI datasets, is the Argonne National Laboratory. Rick Stevens, associate laboratory director at the Department of Energy facility, said that the system would be used “in the fight against COVID-19.”

He added: “The compute power of the new DGX A100 systems coming to Argonne will help researchers explore treatments and vaccines and study the spread of the virus, enabling scientists to do years’ worth of AI-accelerated work in months or days.”

Nvidia has also released a version of the DGX on steroids: the DGX SuperPOD reference architecture. It's 140 DGX A100 systems all clustered together, capable of 700 petaflops of 'AI computing power.'

So far, the SuperPOD has just one customer: Nvidia. The company plans to install four of the pods as part of its internal Saturn V supercomputer, adding 2.8 exaflops of AI computing power, for a total of 4.6 exaflops. 

For cloud computing companies like Amazon Web Services, Google, and Microsoft Azure, there’s a slightly smaller option: The HGX A100.

It will feature four A100s, instead of the DGX’s eight.

Moving further down the power scale is the EGX A100, with just one GPU and a Mellanox ConnectX-6 SmartNIC, targeting the edge market.

Practitioner Portal - for AI practitioners

Story

MLOps startup Verta gets $10m in funding, launches first product

9/1/2020

The company plans to commercialize open source ModelDB project, developed by CEO Manasi Vartak

Story

AI and analytics services: Capabilities and costs

8/27/2020

Which skills do you need in your team? What are the costs for running the service? How can you optimize them? These are three key questions when setting-up and running an AI and analytics service.

Practitioner Portal

EBooks

More EBooks

Upcoming Webinars

Archived Webinars

More Webinars
AI Knowledge Hub

Experts in AI

Partner Perspectives

content from our sponsors

Research Reports

More Research Reports

Infographics

Smart Building AI

Infographics archive

Newsletter Sign Up


Sign Up