DCW ’23: Deploying Generative AI Without GPUs or Supercomputers

Developers can use second hand, CPU-based legacy hardware to do the compute

Deborah Yao, Editor

May 15, 2023

4 Min Read
Data Center World 2023 panel on ChatGPT and generative AI
From left: Neu.ro CEO Uri Soroka and theMind partners Vova Soroka, Constantine Goltsev

At a Glance

  • AI experts say a company does not need a large language model to develop generative AI apps.
  • A large law firm, for example, can create a semantic search engine for about $30,000 to $50,000.
  • Developers can also use second hand CPU-based hardware to do the compute.

Technological innovations that generate a lot of hype typically have to face harsh reality at some point when practitioners begin deployment. For AI, especially generative AI, that reality is starting to sink in.

“Training (a large language model) is extremely costly,” said Constantine Goltsev, partner at AI/ML solutions agency theMind, during a recent panel at Data Center World 2023 in Austin, Texas.

With ChatGPT’s 175 billion parameters, he said, it meant OpenAI had to do 175 billion calculations of the input to produce results, using up gigawatts of power overall. With GPT-4, OpenAI used 12,000 to 15,000 Nvidia A100 chips – each costing $10,000 − on Azure and ran compute for months.

The good news is companies do not have to use gigantic language models because much smaller, open source ones can deliver results that are just as good as ChatGPT or even surpass it.

“You don’t necessarily need the large language model on the industrial scale, like ChatGPT or GPT-4, to do a lot of useful stuff,” Goltsev said. “You can take smaller academic models or open source models on the order of 6 billion parameters, 3 billion parameters, and then you can fine-tune them using exactly the same methods that are used to create ChatGPT. And the results are very decent.”

For example, if a large law firm wants to build a semantic search engine that can go through a mound of legal case files comprising terabytes of data, it can go to AWS and get a few of its large instances, about eight A100 cards and a lot of memory and storage. The cost would be about $30 to $40 an hour.

Then, fine-tuning a mid-size GPT model with 64 billion parameters for a couple of weeks should cost the law firm about $30,000 to $50,000. "That will produce a pretty decent result for you to work with,” Goltsev said.

Access, though, can be tricky. “Even now, some people are reporting it is difficult to just get decent GPU compute out of AWS,” Goltsev said. “So you start scrambling, you start looking at maybe Google or Azure. There is not a lot of places where this capacity can be found. A lot of people think of going on-prem or using bare metal providers to build their own systems.”

Beyond using hyperscalers like AWS, there are other places to get these services but “they are just not well structured,” Goltsev said.

Using legacy CPU-based hardware

Another bit of good news is that with some modifications, legacy CPU-based hardware can suffice, according to Goltsev.

“The news is very good," he said. "All of these components that have been decommissioned by hyperscalers, you can actually create very decent systems out of them with proper engineering."

Goltsev had a client that used a 64-billion parameter model and ran it on a used computer that had 256 GB RAM. They wrote the application in C++, added some modifications, and used quantization techniques to make the model use less memory and make it faster.

“We got some decent results,” Goltsev said. The CPU generated about three tokens per second, or about two words. While the output is “not real time" and the model is "not convenient to chat with it,” he said, offline it can do things like analyze documents and produce output that is “perfectly feasible.”

“And it is a large model that can do a lot of work,” Goltsev added. So “you do not need the latest A100 cards to do these things if you do proper software engineering."

Standard web dev project vs. AI project

However, be aware that deploying AI models is harder than standard web development.

In standard web development, developers write code and then deploy the program. Even if they deploy it on just a single machine, “it’s probably going to work,” said Vova Soroka, partner at theMind. Afterwards. they move on to large production deployments.

“We all know how to do it,” Soroka said. “We know how to scale it vertically, horizontally, what not.”

But it is trickier with AI. There are no AI templates on what to do. “All the standard things we did for web development like DevOps … everybody knows how to do that. Everybody has templates and it is so easy and nice,” Soroka said. With AI, he has seen clients just “wing it” and try to figure out which machines they can use and how to provision GPU cards.

They ask themselves, “what are we doing?,” Soroka said. “It is all so non-standard. It is almost painful.”

About the Author(s)

Deborah Yao

Editor

Deborah Yao runs the day-to-day operations of AI Business. She is a Stanford grad who has worked at Amazon, Wharton School and Associated Press.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like