AI Business is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.

IT & Data Center

Google unveils world’s fastest ML training supercomputer

by Louis Stone
Article Image

As measured by the MLPerf benchmark consortium

Google has used 4,096 Tensor Processing Units (TSUs) to build a supercomputer that it claims outperforms any other AI training system in operation today.

A series of benchmarks by the MLPerf consortium agreed with the statement, but also gave top marks to Nvidia for chip performance.

Build it and they will program

Google's TPUs cannot be purchased, only rented via Google Cloud, while the record-breaking machine is only available internally – and uses the fourth generation TPU design that is yet to be deployed in Google’s cloud data centers. Due to the lack of wider availability, MLPerf ranks both as research projects.

Nvidia's GPUs, meanwhile, can be bought by anyone, so are categorized as commercial. Both companies dominated their respective sectors.

Google said that its system delivers over 430 petaflops of peak AI performance, and also features hundreds of CPU host machines, connected via an ultra-fast, ultra-large-scale custom interconnect.

"Training complex ML models using thousands of TPU chips required a combination of algorithmic techniques and optimizations in TensorFlow, JAX, Lingvo, and XLA," Google AI's Naveen Kumar said.

Nvidia, meanwhile, highlighted how its A100 GPUs outperformed Google's third generation TPUs in some benchmarks. The company sells supercomputers-in a-box called SuperPODs that can feature up to 2,048 A100 chips.

The A100 outperformed its predecessor, the V100, by 1.5-2.5x depending on the benchmark.

Some AI chip startups declined to take part in the competition, including Cerebras and Graphcore.

“We were the only company to submit across all benchmarks with available systems,” Paresh Kharya, senior director of product management, data center computing at Nvidia, said.

Practitioner Portal - for AI practitioners

Story

MLOps startup Verta gets $10m in funding, launches first product

9/1/2020

The company plans to commercialize open source ModelDB project, developed by CEO Manasi Vartak

Story

AI and analytics services: Capabilities and costs

8/27/2020

Which skills do you need in your team? What are the costs for running the service? How can you optimize them? These are three key questions when setting-up and running an AI and analytics service.

Practitioner Portal

EBooks

More EBooks

Upcoming Webinars

Archived Webinars

More Webinars
AI Knowledge Hub

AI for Everything Series

David Hardoon explaining recent developments in Data Science and AI

Author of Getting Started with Business Analytics: Insightful Decision-Making and the forthcoming book, Creating a Data Culture: Failing to Succeed

AI Knowledge Hub

Experts in AI

Partner Perspectives

content from our sponsors

Research Reports

More Research Reports

Infographics

Smart Building AI

Infographics archive

Newsletter Sign Up


Sign Up