Small language models are rising in popularity, but they have problems too. Here's how to address them

Tom Taulli, Contributor

February 16, 2024

4 Min Read
Abstract background with words 'small language models'

At a Glance

  • Small language models are on the rise, but developers using them face issues around performance, expertise and security.

Hugging Face CEO Clem Delangue said this about small language models (SLMs) recently: “My prediction: in 2024, most companies will realize that smaller, cheaper, more specialized models make more sense for 99% of AI use-cases. The current market & usage is fooled by companies sponsoring the cost of training and running big models behind APIs (especially with cloud incentives).”

This is backed up by the momentum in Microsoft’s business. In the latest earnings call, the company announced that customers such as Anker, Ashley, AT&T, EY, and Thomson Reuters were exploring SLMs for generative AI app development.  CEO Satya Nadella has noted:  “Microsoft loves SLMs.”

Why the excitement? SLMs, which are generally five to 10 times smaller than large language models (LLMs), offer compelling benefits.

“They use less energy and have lower latency,” said Sudhakar Muddu, who is the CEO and cofounder of Aisera. “The training and inference times are also quicker. And the small size means you can use an SLM on the edge. But the most important benefit for the enterprise is that they can be tailored for certain domains and industries. This is where you get the gains in productivity.”

However, he does point out that there are challenges with SLMs. The technology is still in the nascent stages and is complex.

Related:Small Language Models Gaining Ground at Enterprises

Here’s a look at some of the most common issues and what can be done about them.

#1 - Performance

SLMs are closing the gap with the capabilities of LLMs, in areas such as accuracy. But the differences can still be noticeable and result in a lower-performing application.

“Their limited understanding and contextual awareness often mean they struggle with complex or niche topics, leading to responses that may not be as relevant or coherent as those generated by larger models,” said David Guarrera, a principal with EY Americas Technology Consulting. “This limitation impacts not just the depth of knowledge these models can access but also their ability to maintain context over longer interactions.”

This is why there should be due diligence about the tradeoffs between SLMs and LLMs. The performance of an SLM can also be significantly improved with fine tuning. In other words, SLMs often do not make much sense when used out-of-the-box.

#2 - Expertise

A common way to optimize an SLM is to use retrieval-augmented generation (RAG). This involves using semantic search – such as with vector databases – to process relevant data. This can improve the accuracy of the generated content as well as allow for more updated results.

Related:Gen AI is Raising the Popularity of Vector Databases

“Any backend developer worth their salt can build an MVP or V1 of a RAG GenAI setup with tools today,” said Cory Hymel, who is the vice president of research and innovation at Crowdbotics.

But building beyond RAG requires someone with a deeper understanding of AI − and this talent is in short supply.

“The next step in complexity is fine-tuning a model, in which you take an existing AI model and introduce new training data to hone it to a specific data set,” said Hymel. “This is more complex because it requires custom data curation, tagging, and running the training, which goes beyond typically generic backend engineer skill sets.”

Enterprise generative AI applications may also involve multiple SLMs, which adds to the complexity. For example, there will be a need to work with orchestration tools, such as Kubernetes.

“As you can imagine, training a single model is one thing, but looking to train multiple models to work together is much more difficult. With this more complex architecture, you can increase total cost of ownership, time to market, and initial upfront investment,” Hymel said.

“Similar to finding AI talent in general, finding the subset of that group with experience in building these types of systems is even more slim. We expect that the large model providers such as Microsoft, OpenAI, and others will begin offering ‘orchestration as a service’ to help gain larger market adoption.”

#3 - Security

Many SLMs are open source. This allows for more control over the security.  For example, an enterprise can deploy an SLM in an on-premise environment.

However, there are still notable issues. “The foremost security risk when using a fine-tuned SLM is data theft and privacy concerns,” Mehrin Kiani, who is an ML scientist at Protect AI. “This is especially prevalent if an SLM is fine-tuned on proprietary and confidential data.”

In fact, these potential attacks are heightened because the code is open source. The managers of the projects also may not have sufficient resources for security. Such factors make it easier for an attacker to target the SLM.

“Training models on adversarial examples and implementing detection mechanisms can help identify and mitigate malicious inputs,” said Tal Furman, who is the director of data science and deep learning at Deep Instinct. “Other best practices are to implement strong access controls, logging, and monitoring for open-source models.”

As with any software that handles sensitive information, there should be robust security reviews for every step of the fine-tuning and operationalization of the SLM.

However, “it is important to note that no security measure can guarantee complete and robust security of SLM-based applications,” said Kiani. “The security posture of these can be improved by designing with security-first principles. An insecure GenAI application is useless no matter how unique and wonderful it is.”

Read more about:

ChatGPT / Generative AI
Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like