Small Language Models Gaining Ground at Enterprises
While integration issues into legacy workflows remain, some emerging startups are coming up with solutions
At a Glance
- Enterprises are taking a fancy to smaller language models because of lower computing costs and typically are domain-specific.
- The small language model generating the most excitement now is Mixtral from Mistral, for its mixture-of-experts method.
- One issue is adapting to tech platform changes: Build systems enabling easy swapping of different small language models.
Small language models (SLMs) are becoming more attractive for enterprises to develop and deploy than large language models because they can get more control, such as in fine-tuning for particular domains and data security. They also are cheaper to run.
"We are seeing early adoption of SLMs in enterprises now, especially as hyperscalers like AWS and Azure are providing access to these models as hosted APIs,” said Pushpraj Shukla, senior vice president of engineering and head of AI/ML at SymphonyAI. “Our company uses these models to power NLU (natural language understanding) tasks for customers in retail, financial services and industrial categories. (But) our customers often do not realize they are using SLMs.”
An SLM is generally five to 10 times smaller than LLMs and are open source projects. The smaller size means much lower energy consumption. They can also be hosted on a single GPU. This is a major benefit given the shortage of these chipsets and the steep cost of compute.
Despite their reduced size, SLMs demonstrate capabilities that are remarkably close to LLMs in various NLU tasks. This is especially the case when they are effectively fine-tuned (or retrained) for specialized use cases, say health care or coding. The process can take minutes to several hours, compared to tens of hours to a few days for LLMs. To get effective results with an SLM, the dataset often should have several hundred thousand examples.
Another benefit of SLMs is that they allow for faster training and inference times, which provides for much lower latency. This means they are ideal for resource-constrained environments.
“Consider SLMs for highly regulated industries like healthcare or those dealing with sensitive personal data,” said Gustavo Soares, who is the global product manager at Dell Technologies. “Their reduced complexity makes them a good choice for on-prem deployment, meeting strict compliance and data privacy standards.”
Some of the top SLMs in the market include Llama-2-13b and CodeLlama-7b from Meta, Mistral-7b and Mixtral 8x7b from Mistral and Phi-2 and Orca-2 from Microsoft.
“The Llama 2 SLMs have been the top choice of the open source community since they were launched in August 2023, consistently performing high on LLM Benchmarks across many different NLU tasks,” said Shukla. “But the Mistral-7b model has gained lots of momentum. It has been shown to beat Llama-13b and even Llama-70b LLM on several tasks.”
“But the model generating the most excitement in the open-source community right now is Mixtral, a mixture-of-experts model from Mistral which uses eight underlying 7-billion models and a router on top and for the first time can match or beat the performance of GPT 3.5 for almost all tasks,” he added. “And as for the Phi and Orca family of models from Microsoft, they are excellent and are focused on reasoning tasks, and can be fine-tuned to be domain-adapted very quickly."
Then there are numerous SLMs with parameter sizes below one billion, such as DistilBERT, TinyBERT and T5-Small. They are mostly for limited use cases – like summarization -- but are ideal for highly-constrained computing environments.
Swapping different SLMs
However, there are major hurdles when adopting SLMs in the enterprise. One issue is that the technology is still in the nascent stages and there are often unexpected changes to the platforms. This can make it difficult to manage applications. Because of this, a good approach is to build systems that can allow for easily swapping different SLMs.
Another challenge is that working with this type of technology requires specialized expertise, such as with ML operations. To be sure, this talent is not easy to find – and can be expensive.
Integrating SLMs with legacy systems is no easy feat either. There is the need to manage complex workflows for pre-processing and post-processing, allowing for refining and adapting the data. However, current SLMs may not be able to effectively do this.
Finally, enterprises will still need to consider the differences with LLMs and SLMs. “There is fear among developers and enterprise users about the quality tradeoffs they have to make against closed-source LLMs like OpenAI’s GPT-4, which continues to be the gold standard on pretty much all NLU tasks in the enterprise,” said Shukla.
“To make sure that they are not trading off quality too much for speed and cost, enterprises need to understand how to measure the quality of SLMs vs. LLMs on their tasks, which is based on human judgments on sample sets and is non-trivial in many cases,” he said.
Thus far, companies have been hiring consultants or using in-house experts to address the issues. But there are startups emerging offering solutions.
For example, OctoAI is developing automations for hosting fine-tuned models. Then there is Databricks: Its MosaicML acquisition is about simplifying the fine-tuning process.
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like