How Smaller LLMs Could Slash Costs of Generative AI

More efficient AI models are already being rolled out

Sascha Brodsky, Contributor

October 10, 2023

4 Min Read
Graphic with text, LLM
Getty Images

At a Glance

  • The high costs of LLMs that power generative AI are a growing concern. Smaller models could be a solution.
  • Smaller parameter counts need less computational power, leading to lower hardware, operational and cloud costs.
  • Fine-tuned language models could soon match the performance of larger models at a fraction of the cost.

The high costs of large language models (LLMs) that power generative AI are a growing concern, but smaller models could be a solution.

“The rise of LLMs like GPT-4 has shown extraordinary leaps in performance, and with this advancement comes increased costs,” Adnan Masood, the chief AI architect of the technology company UST, said in an interview.

“The computational intensity of LLMs, due to their sheer size and computational needs due to billions of parameters, requires extensive power. This intense computation translates to larger energy consumption, increasing the operational cost and environmental impact,” he said. “With model sizes exceeding GPU memory limits, there is an ensuing demand for specialized hardware or complex model parallelism, further compounding infrastructure costs.”

Smaller language models can reduce costs and improve efficiency when they are fine-tuned, Masood said. He noted that there are techniques like distillation and quantization in LLMs to compress and optimize models. Distillation involves training a smaller model using the outputs of a larger one, and quantization reduces the precision of the model's numerical weights to make it smaller and faster.

Smaller models’ “reduced parameter count naturally demands less computational power, ensuring faster inferences and potentially shorter training durations,” he added. “This smaller footprint fits well within conventional GPU memory, eliminating the need for specialized, more expensive hardware setups. With the reduced computational and memory usage, energy consumption drops, directly trimming operational costs. Leveraging APIs for proofs-of-concept or prototyping in production workloads, the low per-token pricing proves beneficial during scaling. Yet, when applications experience rapid growth, relying solely on larger language models can lead to exponential cost increases.”

Related:Meta's Llama 2 Long: Longer Memory, Handles Heftier Tasks

Smaller language models could also slash cloud infrastructure costs, Matt Barrington, Americas emerging technology leader for EY, said in an interview. For instance, fine-tuning a domain-specific model on a cloud-based service requires fewer resources, reducing the training time costs. Companies can also allocate AI resources to other crucial areas closer to the end user.

“By utilizing compact language models in edge computing scenarios, enterprises minimize the dependency on costly cloud resources, leading to cost savings,” he added.

More efficient AI models are already being rolled out. Examples of smaller models include recent ones such as phi-1.5, which, despite their compact size, rival the performances of larger models like GPT-4, Masood said. There are also domain-specific models like Med-PaLM 2, tailored for the health care sector and life sciences industry, and Sec-Palm, meant for security applications.

Related:Meta Offers Companies Free Use of Llama 2 Language Model

“Models like Llama 2 70b, which is priced considerably lower than its contemporaries like Google's PaLM 2, are emerging as cost-effective solutions,” Masood added. These are “a stark reduction from earlier models. Meta's 13-billion-parameter LLaMA even outperformed the larger GPT-3 in most benchmarks.”

Initiatives such as the BabyLM challenge at Johns Hopkins University aim to make small models as effective as LLMs. Amazon has a marketplace for these smaller models that can be customized with a company's data. Anyscale and MosaicML sell models such as the 70 billion-parameter Llama 2 at lower prices, highlighting the move towards cost-effective, strong models.

Surging Large Language Model Costs

There is a pressing need to cut the costs of LLMs. One significant expense is the GPUs that are used for training the LLMs. Perhaps the most sought-after is Nvidia’s H100, which fetches $30,000 or more apiece, Muddu Sudhakar, CEO of Aisera, noted in an interview. There is a waitlist for such GPUs, with some VCs using them as bait to attract startups for funding.

Related:Meta to Charge for Llama 2 After All – If You’re a Hyperscaler

Even if you get the GPUs, you need a business that generates enough revenues to cover their costs, Sudhakar said. A recent blog post from VC firm Sequoia notes a big monetization gap, which could be an issue for the generative AI market.

“Once you obtain the GPU, you will need data scientists, which are very tough to recruit. The comp packages are also substantial,” he added. “Finally, operationalizing LLMs is expensive in terms of processing interactions, managing and upgrading the models for prompt injections, security issues, hallucinations, etc.”

Masood predicted that fine-tuned LLMs would soon match the performance of larger models at a fraction of the cost. He said the open-source community has been addressing practical challenges with techniques like LongLoRA that show how context windows can be dramatically extended.

“If the trajectory is any indicator, the coming era might witness a synthesis of open-source models and smaller LLMs, forming the backbone of the next-generation language modeling ecosystem,” he added.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Sascha Brodsky

Contributor

Sascha Brodsky is a freelance technology writer based in New York City. His work has been published in The Atlantic, The Guardian, The Los Angeles Times, Reuters, and many other outlets. He graduated from Columbia University's Graduate School of Journalism and its School of International and Public Affairs. 

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like