Nvidia Upgrades AI Chip, Says Inference Costs Will Drop 'Significantly'

GH200 chip gets a HBM3e processor to boost memory capabilities.

Ben Wodecki, Jr. Editor

August 9, 2023

3 Min Read
nvidia logo

At a Glance

  • Nvidia unveiled an upgraded version of its GH200 AI chip, adding HBM3e memory to offer faster bandwidth.
  • It also unveiled Omniverse advancements and access to its supercomputers via Hugging Face.

Nvidia has upgraded one of its AI chip lines for improved generative AI performance and memory capabilities, which the chipmaker said will "significantly" lower inference costs.

Unveiled at the annual SIGGRAPH conference, the latest iteration of its GH200 Grace Hopper Superchip platform has been supped with HBM3e processors to power AI workloads through increased capacity and bandwidth.

"We created a brand new processor for the era of generative AI," said CEO Jensen Huang during a keynote speech. "This processor is designed for scale out of the world's data centers."

He added that one can take any large language model (LLM) and "put it in this and it will inference like crazy." Moreover, "inference costs of LLMs will drop significantly," Huang claimed.

The GH200 with HBM3e is set to release in Q2 2024.

Nvidia GH200: Let’s get technical

There are two GH200s – the GH200 Grace Hopper Superchip, which starts shipping in September, and the newly unveiled GH200 with HBM3e.

They’re essentially the same device – both have Grace CPUs and GH100 Hopper compute GPUs. The main difference is the addition of the HBM3e processor – designed to deliver high-bandwidth memory while using less power. HBM3e is the latest in high bandwidth memory interfaces, usurping the HBM3.

Nvidia said that adding an HBM3e processor to the GH200 can deliver 282GB of memory – and with more memory means the chips can handle sizable workloads and AI models more easily. The current-gen GH200 chip only comes with 96GB of HBM3 memory capable of producing bandwidth of around 4 Terabytes per second (TB/s).

By adding the HBM3e to the GH200, the hardware maker said the new GH200 is 50% faster compared with the current generation offering, at 10TB/s of bandwidth.

Despite one chip offering more bandwidth than the other, Nvidia plans on supporting both – with the GH200 with HBM3E seen as a premium offering.

The new chip can be connected with other Nvidia Superchip lines via NVLink, Nvidia’s high-speed interconnect, meaning different hardware can work together to power AI model deployments.

The new GH200 with HBM3E is compatible with Nvidia’s MGX server specification, meaning it is used alongside DGX and HGX data center compute platforms.

Nvidia teams up with Hugging Face

Nvidia also announced at SIGGRAPH 2023 that it’s working with Hugging Face to give developers access to supercomputers to train their AI models.

The partnership will see Nvidia’s DGX Cloud AI supercomputing platform accessible via Hugging Face. Dubbed Training Cluster as a Service, Hugging Face’s new offering will give users the ability to train and tune AI models.

Each instance of DGX Cloud features eight H100 or A100 80GB Tensor Core GPUs for a total of 640GB of GPU memory per node, enabling high performance on AI workloads.

Hugging Face CEO Clément Delangue said the partnership will “enable companies to take their AI destiny into their own hands with open source and with the speed they need to contribute to what's coming next."

Omniverse updates

Nvidia also announced a slew of updates to Omniverse, its metaverse application development platform.

Adobe Firefly – Adobe’s family of generative AI models – is set to be made available as APIs in Omniverse.

Nvidia also showcased a new central repository for accessing and sharing Omniverse extensions, called Omniverse Kit Extension Registry.

There are also new extended-reality (XR) developer tools allowing users to build spatial-computing options natively into their Omniverse-based applications.

Read more about:

ChatGPT / Generative AI

About the Author

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like