What is a Large Language Model?
Learn about the different types of large language models and how they can be used to improve your machine learning systems.
The rise of generative artificial intelligence (AI) has brought large language models to consumers and enterprises alike. As complex tools, trained through machine learning, large language models are capable of general-purpose understanding and generation of human languages and code.
This has forced a rapid rethink of the responsible AI tools and techniques necessary to rein in and then hopefully align large language models with both organizational requirements and governmental regulations over time.
What is a language model?
Your Alexa and Siri can process speech audio. You can use ServiceHub to improve your customer service needs. And you can translate phrases at the touch of a button using Google Translate. But you cannot perform any of these natural language processing (NLP) tasks without a language model.
The concept of a “language model” has existed for decades – in essence, language models are probabilistic machine learning models of natural languages, which can be used for numerous tasks, including speech recognition, machine translation, answering questions, semantic searching, and optical character and handwriting recognition.
By analyzing the data, such as text, audio or images, a language model can learn from the context to predict what comes next in a sequence. Most often, language models are used in NLP applications that generate text-based outputs, such as machine translation.
For example, when you start typing an email, Gmail may complete the sentence for you. This happens because Gmail has been trained on volumes of data to know that when someone starts typing “It was very nice …” often the next words are “… to meet you.”
How do language models work?
Language models are fed sizable amounts of data via an algorithm. The algorithm determines the context of the data to determine what comes next. The model then applies what the algorithm dictates to produce the desired outcome, such as making a prediction or producing a response to a query.
Essentially, a language model learns the characteristics of basic language and uses what it gleaned to apply to new phrases.
There are several different probabilistic approaches to modeling language, which vary depending on the purpose of the language model. From a technical perspective, the various types differ by the amount of text data they analyze and the math they use to analyze it.
For example, a language model designed to generate sentences for an automated Twitter bot may use different math and analyze text data differently than a language model designed for determining the likelihood of a search query.
What are the different types of language models?
Types of Large Language Models
There are two main types of language models:
Statistical Language Models – which include probabilistic models that can predict the next word in a sequence.
Neural Language Models – which use neural networks to make predictions.
Statistical Language Model types
Unigram– Among the simplest type of language model. Unigram models evaluate terms independently and do not require any conditioning context to make a decision. It is used for NLP tasks such as information retrieval.
N-gram – N-grams (also known as Q-grams) are simple and scalable. These models create a probability for predicting the next item in a sequence. N-gram models are widely used in probability, communication theory and statistical natural language processing.
Exponential – Also referred to as maximum entropy models. These models evaluate via an equation that combines feature functions and n-grams. Essentially, it specifies features and parameters of the desired results and leaves analysis parameters vague. Most commonly found in computational linguistics applications.
Neural Language Model types
Speech Recognition – The likes of Alexa and Siri are given their voices through this model type. Essentially, it gives a device the ability to process speech audio.
Machine Translation – The conversion of one language to another via a machine. Google Translator and Microsoft Translate use this type.
Parsing Tools – This model type analyzes data or a sentence to determine whether it follows grammatical rules. Tools like Grammarly use parsing to understand a word’s context to others in a document.
What is a large language model?
Large language models are a type of language model that can understand and generate language. Large language models are trained on massive datasets. They are often derived from raw sources such as social networks and code repositories. Large language models can help to fuel the new wave of generative AI, which is defined by its ability to produce new and original work of its own.
With just a simple text, image or audio prompt, generative AI can produce content in seconds on spec—be it an original essay on trickle-down economics, a picture of New York drawn in the style of Monet or a rap about Reese’s Pieces.
Well before the emergence of generative AI, market researchers concluded that natural language processing models trained on these public data sets inherently contain more bias than those trained on highly curated data sets. For this reason, many enterprise large language models solutions incorporate training (or fine-tuning) data derived from highly curated data sets—and, increasingly, data sets that language models generate.
What are some examples of large language models available today?
Credit: Getty Images
ChatGPT – ChatGPT is the application that truly kick-started the mainstream public’s fascination with AI.
Released in November 2022, ChatGPT is an interface application that allows users to ask it questions and generate responses.
It launched using a combination of InstructGPT and GPT 3.5 before later seeing the powerful GPT-4 powering premium versions of the application.
ChatGPT has gone on to act as the basis for a series of Microsoft products after the software giant invested in OpenAI to gain access to the application.
ChatGPT is a closed system, meaning OpenAI keeps full control and ownership of the application. OpenAI has kept the parameter levels of GPT-4 private.
LLaMA – Large Language Model Meta AI (LLaMA), is designed for researchers and developers to make models.
LLaMA is an open source model designed to be smaller than the likes of GPT-3. It’s designed for users who lack the computing power to develop language models.
Since its release in late February 2023, LLaMA has been routinely fine-tuned by researchers to create other language models, such as Vicuna.
I-JEPA – I-JEPA is an AI model published by Meta in June 2023. The model itself is not the star, but rather how it was built using a new architecture.
The JEPA approach can predict missing information akin to a human’s ability for general understanding, something the generative AI method cannot do.
Meta’s chief AI scientist Yann LeCun has continuously proposed the idea that deep learning AI models can learn about their surroundings without the need for human intervention. The JEPA approach aligns with that vision and doesn’t involve any overhead associated with applying more computationally intensive data augmentations to produce multiple views.
PaLM 2 – Google’s flagship language model, unveiled at the company’s annual I/O conference. The model supports over 100 languages and is designed to be fine-tuned for domain-specific applications.
PaLM comes in a variety of sizes–each of which is named after an animal to represent its size. Gecko is the smallest, and then there's Otter, Bison, and up to Unicorn, the largest.
Auto-GPT – Short for Autonomous GPT, Auto-GPT is an open source project attempting to provide internet users access to a powerful language model.
Auto-GPT is built off OpenAI’s GPT-4 and can be used to automate social media accounts or generate text, among other use cases.
The model grew popular online following its April 2023 launch, with the likes of former Tesla AI chief Andrej Karpathy among those praising the model’s abilities.
Gorilla – The first AI model on this list to utilize Meta’s LLaMA as its body, Gorilla was fine-tuned to improve its ability to make API calls – or more simply, work with external tools.
The end-to-end model is designed to serve API calls without requiring any additional coding and can be integrated with other tools.
Gorilla can be used commercially in tandem with Apache 2.0 licensed LLM models.
Claude – Is developed by Anthropic, which was founded by former OpenAI staff who left over disagreements about close ties with Microsoft.
Anthropic went on to develop Claude, a chatbot application not too dissimilar to ChatGPT apart from one thing–increased focus on safety.
Claude uses constitutional AI, a method developed by Anthropic to prevent it from generating potentially harmful outputs. The model is given a set of principles to abide by, almost like giving it a form of 'conscience.'
Stable Diffusion XL – Stable Diffusion XL is the latest iteration of the text-to-image model that rose to fame in 2022.
SDXL 0.9 also boasts image-to-image capabilities, meaning users can use an image as a prompt to generate another image. Stable Diffusion XL also allows for inpainting, where it can fill in missing or damaged parts of an image, and outpainting, which extends an existing image.
Dolly/Dolly 2.0 – Named after Dolly the sheep, the world’s first cloned mammal, the Dolly AI models from Databricks are designed to be small and much less costly to train compared to other models on this list.
Dolly is a fine-tuned version of EleutherAI’s GPT-J language model. Dolly is designed to be highly customizable, with users able to create their own ChatGPT-like chatbots using internal data.
Dolly 2.0 is built using EleutherAI’s Pythia model family. The later iteration was fine-tuned on an instruction-following dataset crowdsourced among Databricks employees. It’s designed for both research and commercial use.
XGen-7B – XGen-7B, developed by Salesforce, is a family of large language models designed to sift through lengthy documents to extract data insights.
Salesforce researchers took a series of seven billion parameter models and trained them on Salesforce’s in-house library, JaxFormer, as well as public-domain instructional data. The resulting models can handle 8,000 sequence lengths for up to 1.5 trillion tokens.
Vicuna