April 23, 2021
A huge chip (at a huge price)
The largest chip ever made has been upgraded. The Wafer Scale Engine-2 more than doubles the transistor count over Cerebras Systems' current flagship, making it the world’s highest transistor count integrated circuit.
The WSE-2 features 2.6 trillion transistors and 850,000 'AI-optimized' cores on a single chip. For comparison, the most complex GPU ever built features just 54 billion transistors.
Of course, while the Nvidia A100 costs upwards of $12,500, each Cerebras WSE retails for $2m apiece, and is only available as part of a wider CS-1 system which spans 15 rack units.
One chunky boi
The core idea behind the WSE-1 and 2 is that training neural networks is slow on conventional hardware because data has to be shifted back and forth between the processor, accelerator, and external DRAM memory. Instead, why not build a big chip with everything on board?
The Department of Energy’s National Energy Technology Laboratory found the approach highly effective, at least for some workloads.
NETL pitted a CS-1 system against 16,384 Intel Xeon Gold 6148 cores in its Joule supercomputer for a very specific computational fluid dynamics workload – and the CS-1 proved 200 times faster.
Against a single GPU, it was approximately 10,000 times faster.
While the chip has a whopping 18GB of integrated SRAM memory, that is less than what is thought to be required by cutting-edge AI applications.
"A single 20-core Xeon 6148 socket has 27.5 MB of last-level (L3) cache, and at 16K cores the aggregate cache is 22.5 GB, substantially more than the memory of the CS-1," a NETL research paper explained. "But the Xeon caches seem to be less effective at deriving performance from the available SRAM."
The researchers noted: "Memory will limit the maximum problem size that can be solved on CS-1... [but] there are compelling HPC use cases for the CS-1, notwithstanding its modest memory capacity."
The WSE-2 brings the chip's memory up to 40GB of on-chip SRAM, along with more than doubling memory bandwidth to 20 petabytes, and 220 petabits of aggregate fabric bandwidth.
“Less than two years ago, Cerebras revolutionized the industry with the introduction of WSE, the world’s first wafer scale processor,” Dhiraj Mallik, VP of hardware engineering at Cerebras, said.
“In AI compute, big chips are king, as they process information more quickly, producing answers in less time – and time is the enemy of progress in AI. The WSE-2 solves this major challenge as the industry’s fastest and largest AI processor ever made.”
Cerebras’ existing customers include NETL, GlaxoSmithKline, Tokyo Electron Devices, the Pittsburgh Supercomputing Center, and the University of Edinburgh.
TSMC, which custom manufactures the 7nm WSE-2, plans to begin commercial production of wafer-scale processors, potentially for other customers.