December 1, 2022
Cerebras Systems is one of those rare companies in Silicon Valley that continues to raise the bar – not easy in a region full of innovative startups. It has been unveiling ground-breaking technologies since 2019, when the AI hardware startup announced that it had built the world’s largest and fastest processor.
Cerebras’ first Wafer Scale Engine (WSE) chip, designed for AI applications, measures eight inches by eight inches, 56 times larger than the largest GPU, and features 1.2 trillion transistors and 400,000 computing cores.
Then in 2021, the company one-upped itself with its second-generation processor. The WSE-2 chip doubles the performance of the original with 2.6 trillion transistors and 850,000 cores.
Last month, Cerebras did it again. It introduced the Andromeda AI supercomputer, which it touts as one of the world’s fastest AI supercomputers. The startup built the system by clustering 16 of its CS-2 servers, each running a WSE-2 chip.
In an interview with AI Business, Cerebras CEO Andrew Feldman explains what makes Andromeda special, how it has already helped with COVID-19 research and why he believes his technology is better than systems powered by GPUs.
What follows is an edited version of that conversation.
AI Business: What’s your business model? Is it selling processors or building supercomputers for organizations to use?
Andrew Feldman: The business model is absolutely not selling chips. We sell systems. We sell boxes. We sell time on boxes. We allow you to subscribe to equipment, whether it’s on our premises or on your premises.
We have a variety of consumption models. Some customers like to have equipment on-premises and capital expenses. Some like operating expenses and don’t care where the equipment is. They just want to jump on the cloud and SSH into something.
AI Business: Is Andromeda the first supercomputer you’ve built?
Feldman: We saw an opportunity to link 16 of our systems together. We had a model that said we can link 192 of them together. But 16 is the starting point. At 16 systems, you have 13.5 million cores. So we built one of the largest AI computers ever built.
AI Business: What makes Andromeda different from other supercomputers?
Feldman: It’s a couple of things. First, it’s optimized for AI work. Everything about it is tuned for AI problems. It’s not a good machine if you want to do traditional supercomputing work. It is the perfect machine if you want to do big AI. What that means is it supports very large parameter counts. Hundreds of billions of parameters.
It runs in an approach called data parallel, which allows for a quick, easy setup. It allows you to avoid a lot of the heavy lifting that’s traditionally done in supercomputing. You don’t have to worry about parallelizing your code. You don’t have to worry about any of the things that have plagued big compute. You just write in Python, PyTorch or TensorFlow, and it goes.
Usually supercomputers take weeks of massaging to get your work to them. Here, you type in four things into a Jupyter Notebook, and you’ve sent your job to the largest AI supercomputer around.
So, it’s easier to use, bigger, faster and dedicated to AI.
AI Business: What was your strategy behind the design of Andromeda? How did you build it?
Feldman: When we built the system, we knew we could cluster them together and achieve exceptional performance, exceptional scale. It’s a cluster of 16 CS-2 servers tied together with our technology. We have two sets of technology that allows us to tie machines together to make very large clusters. One is SwarmX, a fabric that allows us to tie these machines together. The other is MemoryX, a parameter store. It allows us to store very large parameter sets for these giant language models.
What you see here is one of the rarest of characteristics in a supercomputer. When you go from one to two machines in Andromeda, you get 1.9 times the performance. When you go from two to four machines, you get 3.94. When you go to eight, you get 7.87, and when you go 16, it’s 15.5. So what you get is this extraordinary characteristic called almost perfect linear scale.
You get this extraordinary result of almost 16 times the performance for 16 machines. That is extremely rare. Usually there’s a tremendous penalty as you add machines. You get sublinear scaling that’s dramatic.
AI Business: What makes your chip design better than, say, Nvidia’s GPUs? What makes WSE-2 unique and fast?
Feldman: Ours is the size of a dinner plate, and most traditional chips are the size of postage stamps.
In 2019, our first chip was 16 nanometers. Then we shrunk it to 7 nanometers in 2021. It’s 46,225 square millimeters. The largest graphics processor is 820 square millimeters, so it’s 56 times larger. We have 850,000 AI cores. The largest Nvidia part has just under 7,000, so we have 123 times more cores.
We have 40 gigabytes of on-chip memory. They have 40 megabytes. So we have 1,000 times more memory on-chip. As for memory bandwidth, which is a fundamental problem in AI and one of the binding constraints, we have 20 petabytes per second. They have 1,500 gigabytes per second. So we have 12,733 times more memory bandwidth.
As for fabric bandwidth, which connects the cores together, we have 220 petabits. They have 600 gigabytes. So we have 45,000 times more bandwidth to connect cores together. So what you see is we have vastly more computational resources.
AI Business: And you use AMD chips, too, correct?
Feldman: We do. When you have a machine that large, you need to feed it an absurd amount of data. We use 18,176 AMD Epyc cores to feed it data. They’re the ones that hold the data and send the data to our accelerator – to Andromeda – to crunch on it.
AI Business: I read a story that said Andromeda doesn’t qualify for the Top 500 List because it doesn’t do 64-bit double precision. On performance alone, where would it reside in the Top 500 List if it were allowed?
Feldman: In the top 10. But we are focused on AI work, and AI work uses a different numerical format. We use 16 bit and 32 bit. The giant supercomputers like Frontier and Aurora use 64-bit double precision. Those are designed for different work. And so it’s not exactly a fair comparison.
AI Business: How much did it cost to build Andromeda?
Feldman: If you want us to build one for you, it would cost around $30 million to buy.
AI Business: Who is the target audience for Andromeda? I understand it’s free for universities.
Feldman: We’ve made it available to graduate students and collective academics to further their work. They apply, and we schedule time for them on the system.
We have a dozen enterprises so far that have rented time on the system.
AI Business: Can you give an example of a use case?
Feldman:Argonne National Laboratory published a paper. It won the Gordon Bell Award as the best paper in the supercomputing industry, in which they analyzed the COVID genome by putting the entire COVID genome in the sequence window of a large language model.
What they were trying to do is take the initial viral DNA of COVID and predict mutations, and what they were able to do from the initial Alpha variant is predict the Delta variants.
One of the things they noted was that they tried to do this on GPUs and our system. They noted that for the larger model sizes – 2.5 billion and 25 billion parameters – training was infeasible on a GPU cluster due to out-of-memory errors during attention computation.
They had a 2,000-node GPU cluster at their disposal, and they simply couldn’t do this work, and our 16-node Andromeda could. So right out of the gate, we’re doing work that is not possible for other machines to do.
AI Business: What are your goals with Andromeda?
Feldman: Solve hard problems in AI that matter. To better understand COVID, matters. We’re going to try other viruses – the flu, the cold. We’re going to work with leading pharmaceutical companies and leading research organizations to really push the state of knowledge forward. We’ve got a big queue now of people who want to use it.
About the Author(s)
You May Also Like