The mighty slab of silicon is ready to take on those puny GPUs
by Max Smolaks 20 August 2019
American startup Cerebras Systems has left
stealth with its first ever product, a massive processor for artificial
intelligence workloads – and when we say massive, we really mean it.
Scale Engine (WSE) measures nearly 8.5 by 8.5 inches and features 400,000 cores, all optimized for deep learning, accompanied
by a whopping 18GB of on-chip memory.
“Deep learning … has unique, massive, and growing computational requirements. And it is not well-matched by legacy machines like graphics processing units, which were fundamentally designed for other work,” said Dr. Andy Hock, director of product management for Cerebras.
“As a result, AI today is constrained not
by applications or ideas, but by the availability of compute. Testing a single
new hypothesis – training a new model – can take days, weeks, or even months
and cost hundreds of thousands of dollars in compute time. This is a major
roadblock to innovation.”
Cerebras was established in 2016 to design
hardware accelerators for deep learning, a subset of machine learning based on
artificial neural networks. The company is led by Andrew Feldman, who
previously served as CEO of SeaMicro and general manager of AMD’s server
AI workloads present a perfect application for
parallel computing, in which calculations are divided into smaller tasks that
are run at the same time, across many cores. The latest Intel Xeon CPUs have up
to 64 cores. Nvidia Tesla V100, a top-of-the-line GPU designed specifically for
AI, has 640 Tensor cores – modern GPUs for gaming can have up to 3,000 cores,
but are unsuitable for deep learning.
Cerebras has managed to squeeze 400,000
cores into a single chip package, which means it should be able to accomplish
more demanding tasks, and do this quicker than conventional chips.
The wafer scale approach has another benefit
– due to the laws of physics, connections between cores on a single chip are
much, much faster than connections between cores across separate chips, even if
they are installed in the same system.
Even if a conventional chip or system could
match the number of cores on WSE, it would
be hard to match the speed at which it can shuffle information between its
cores and system memory – something that Cerebras is doing using the proprietary
The startup says its chip contains 3,000
times more on-chip memory and can deliver 10,000 times more memory bandwidth
than the industry’s largest GPU.
“Altogether, the WSE takes the fundamental properties of cores, memory, and interconnect to their logical extremes,” Hock said.
The Cerebras software stack has been
integrated with popular open source machine learning frameworks like TensorFlow
and PyTorch. All we want to see now is a server platform or appliance that can house,
power, and most importantly, cool this beast of a processor.