October 25, 2022
The open source hardware server can be viewed online in VR
Meta’s AI team has unveiled its next-gen hardware AI platform, Grand Teton, to handle increasingly sophisticated and powerful ML models at scale.
Showcased at the recent Open Compute Project (OCP) summit, the liquid-cooled system is an open source hardware design that also is built to reduce costs occurred when running AI applications.
“Today, some of the greatest challenges our industry is facing at scale are around AI,” according to a Meta blog. “Our AI and machine learning models are becoming increasingly powerful and sophisticated and need more high-performance infrastructure to match.”
For example, deep learning recommendation models have “tens of trillions of parameters and can require a zettaflop of compute to train.”
Greater compute capacity
The GPU-based system is replacing Zion, housing a single chassis compared to the prior model's multiple to improve performance when handling neural networks. Teton has around four times the bandwidth, compared to its predecessor.
The latest piece of hardware has more memory capacity and improved compute capacity, Alexis Bjorlin, vice president of Meta infrastructure hardware, said in a speech at OCP 2022.
Grand Teton is built on Nvidia GPUs – specifically the upcoming H100 Tensor Cores. The H100 can record performance results up to 4.5 times faster than its current fastest AI chip, the A100, according to Nvidia.
Teton makes use of its H100 to train and run deep learning models in the data center, with its improved bandwidth enabling it to create larger clusters of systems for training or running larger AI models.
Having one integrated server “dramatically simplifies the deployment of systems, allowing us to install and provision our fleet much more rapidly, and increase reliability,” said Bjorlin.
Meta has created a platform allowing anyone to view digital representations of the company's data center servers. Users can view hardware such as Teton in VR or via the web and get virtually hands-on with its server platforms.