12 Language Models You Need to Know

The future of natural language processing: Next-generation AI models to power enterprise use cases

July 12, 2023

14 Min Read

Individual letters falling in random vertical patterns

Getty Images

At a Glance

AI Business breaks down essential language models you need to know – from ChatGPT and Claude to I-JEPA and Inflection-1

Last July, AI Business published a list of 7 language models you need to know. At the time, there was no ChatGPT, PaLM and or LLaMA.

Twelve months on, AI has entered mainstream consciousness as consumers and enterprises race to use these newfound AI tools and models.

AI Business has compiled a new list looking at some of the most important models shaking up the AI space.

ChatGPT

Developer: OpenAI

Parameters: Unknown

ChatGPT is the application that truly kick-started the mainstream public’s fascination with AI.

Released in November 2022, ChatGPT is an interface application that allows users to ask it questions and generate responses.

It launched using a combination of InstructGPT and GPT 3.5 before later seeing the powerful GPT-4 powering premium versions of the application.

ChatGPT has gone on to act as the basis for a series of Microsoft products after the software giant invested in OpenAI to gain access to the application.

ChatGPT is a closed system, meaning OpenAI keeps full control and ownership of the application. OpenAI has kept the parameter levels of GPT-4 private.

Use cases

Text generation and summarization – ChatGPT can generate human-like text via detailed responses. As well as answering a variety of questions, it also has effective summarization abilities, able to parse through extensive pieces of information to deliver concise, understandable summaries, making it a powerful tool for distilling complex content into more manageable forms.

Code generation - ChatGPT can generate code across multiple programming languages, offering solutions to coding problems, and helping in debugging and demonstrating coding practices. Its results, while generally reliable, should still be reviewed for accuracy and optimality. ChatGPT is the most popular AI tool among developers, according to a Stack Overflow survey.

Operating robots – A team of engineers at Microsoft showcased the possibility of allowing ChatGPT to control a robot. A demo saw OpenAI’s language model hooked up to a robotic arm and tasked to solve some puzzles. ChatGPT asked the researchers clarification questions when the user’s instructions were ambiguous. It even wrote code structures for a drone it was controlling in another experiment.

Summarizing bills for lawmakers – Congressional staff is using ChatGPT. Some 40 licenses to ChatGPT+, the $20 premium version that uses GPT-4, were awarded to Congressional staff. No word on what the licenses to ChatGPT+ were to be used for, though reports surfaced that staff were using the tool for creating and summarizing content.

Key reading on ChatGPT

Introducing ChatGPT

ChatGPT Passes 1 Billion Page Views

Learn how to work with the ChatGPT and GPT-4 models

LLaMA

Developers: Meta, FAIR

Parameters: From 7 to 65 billion

LLaMA – which stands for Large Language Model Meta AI, is designed for researchers and developers to make models. LLaMA is an open source model designed to be smaller than the likes of GPT-3. It’s designed for users who lack the computing power to develop language models.

Since its release in late February 2023, LLaMA has been routinely fine-tuned by researchers to create other language models, such as Vicuna.

Use cases

Open source model development – LLaMA has formed the underlying basis for a variety of open source AI models, including Dolly, Alpaca and Gorilla to name a few. LLaMA is made to be tinkered with, as researchers and developers alike flocked to the AI model. The seven billion parameter version of LLaMA has proven incredibly popular as its size means it requires less computing power to run.

Key reading on LLaMA

Introducing LLaMA: A foundational, 65-billion-parameter large language model

Meta: LLaMA Language Model Outperforms OpenAI’s GPT-3

LLaMA: Open and Efficient Foundation Language Models

Access LLaMA code - https://github.com/facebookresearch/llama

I-JEPA

Developers: Meta, FAIR

Parameters: Unknown

I-JEPA is an AI model published by Meta in June 2023. The model itself is not the star, but rather how it was built: Using a new architecture.

The JEPA approach can predict missing information akin to a human’s ability for general understanding, something the generative AI method cannot do.

Meta’s chief AI scientist Yann LeCun has continuously proposed the idea that deep learning AI models can learn about their surroundings without the need for human intervention. The JEPA approach aligns with that vision and also doesn’t involve any overhead associated with applying more computationally intensive data augmentations to produce multiple views.

Use cases

Self-Supervised Learning from Images – I-JEPA (Image Joint Embedding Predictive Architecture) creates an internal model of a subject and compares abstract representations of images rather than comparing individual pixels themselves.

Effectively, I-JEPA learns and applies that information to a variety of applications without needing extensive fine-tuning.

Key reading on I-JEPA

Yann LeCun’s AI Vision Realized with New Meta I-JEPA Model

I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Access I-JEPA’s model code and checkpoints - https://github.com/facebookresearch/ijepa

PaLM 2

Developer: Google

Parameters: 340 billion (reported)

PaLM 2 is Google’s flagship language model. Unveiled at the company’s annual I/O conference, the model supports over 100 languages and is designed to be fine-tuned for domain-specific applications.

PaLM comes in a variety of sizes – each of which is named after an animal to represent its size. Gecko is the smallest, and then there's Otter, Bison, and up to Unicorn, the largest.

Use cases

Chatbot improvement – PaLM now powers Bard, Google’s answer to ChatGPT. PaLM powers Bard to generate text and code as well as summarize documents.

Audio generation and speech processing – PaLM 2, when combined with an audio generation model, can be used to generate text and speech for speech recognition and speech-to-speech translation. When combined with AudioLM, PaLM 2 can leverage larger quantities of text training data to assist with speech tasks. Google contends that adding a text-only large language model to an audio-generative system improves speech processing and outperforms existing systems for speech translation tasks.

Health care applications – Among examples of PaLM 2 being fine-tuned and applied to a specific sector, Med-PaLM-2 showcases the model’s versatility. Users can prompt the model to determine medical issues with images, like X-rays. According to Google researchers, Med-PaLM-2 achieved a nine-times reduction in inaccurate reasoning - approaching the performance of clinicians to answer the same set of questions.

Key reading on PaLM 2

Google AI - Introducing PaLM 2

PaLM 2 Technical Report

Google I/O Analysis: PaLM 2 vs. Hyperscalers' Approach

Auto-GPT

Developers: Auto-GPT development team

Parameters: Unknown

Short for Autonomous GPT, Auto-GPT is an open source project attempting to provide internet users access to a powerful language model. Auto-GPT is built off OpenAI’s GPT-4 and can be used to automate social media accounts or generate text, among other use cases.

The model grew popular online following its April 2023 launch, with the likes of former Tesla AI chief Andrej Karpathy among those praising the model’s abilities.

Use cases

Automating Twitter accounts – Despite Elon Musk’s attempts to take down bot accounts, Auto-GPT could be used to power Twitter profiles. Auto-GPT is used to power the IndiepreneurGPT account and automatically tweets from it.

Automating general processes – Auto-GPT is designed for experimentation. So far, developers have used the model to do such things as autonomously order pizza, plan trips, or book flights. The team behind it warns, however, that it's not “polished” and “may not perform well in complex, real-world business scenarios.”

Key reading on Auto-GPT

Auto-GPT: The One AI Assistant to Rule Them All?

Auto-GPT: An Autonomous GPT-4 Experiment

Auto-GPT Unmasked: The Hype and Hard Truths of Its Production Pitfalls

Access Auto-GPT - https://github.com/Significant-Gravitas/Auto-GPT

Gorilla

Developers: UC Berkeley, Microsoft Research

Parameters: Seven billion parameters

The first AI model on this list to utilize Meta’s LLaMA as its body, Gorilla was fine-tuned to improve its ability to make API calls – or more simply, work with external tools. The end-to-end model is designed to serve API calls without requiring any additional coding and can be integrated with other tools.

Gorilla can be used commercially in tandem with Apache 2.0 licensed LLM models.

Use cases

Virtual assistants – By utilizing APIs, Gorilla can be applied to many applications. For example, by accessing APIs for calendars, Gorilla could be used to power virtual assistant applications. The model could, when queried, return the current date without taking any input, for example.

Search improvements – Using natural language prompts in a search tab, Gorilla could access a search-focused API, like Wikipedia search, to return short text snippets or have an improved understanding of tasks. For example, instead of listing all files under a certain name, it would list the most recent file relevant to the context.

Key reading on Gorilla

Meet Gorilla: The AI Model That Beats GPT-4 at API Calls

Gorilla: Large Language Model Connected with Massive APIs

Gorilla Spotlight Demo

Try Gorilla via Colab - https://colab.research.google.com/drive/1DEBPsccVLF_aUnmD0FwPeHFrtdC0QIUP?usp=sharing

Access the Gorilla code - https://github.com/ShishirPatil/gorilla

Claude

Developer: Anthropic

Parameters: Unknown (although Anthropic’s Constitutional AI paper refers to one AnthropicLM v4-s3, which boasts 52 billion parameters)

Think of Claude as ChatGPT’s sensible cousin. Anthropic was founded by former OpenAI staff who left over disagreements about close ties with Microsoft.

Anthropic went on to develop Claude, a chatbot application not too dissimilar to ChatGPT apart from one thing – increased focus on safety.

Claude uses constitutional AI, a method developed by Anthropic to prevent it from generating potentially harmful outputs. The model is given a set of principles to abide by, almost like giving it a form of 'conscience.'

At the time of writing, Claude 2 is the latest version. Unveiled in July 2023, Claude 2 boasts improved performance, capable of acting as business and technical analysts.

Use cases

Document analysis – Claude can be used to obtain insights from multiple lengthy documents or even books. Users then ask Claude questions about documents. This feature comes from Claude’s sizable context window, a range of tokens that the AI considers before generating an output. Claude’s context window spans 100,000 tokens of text – or around 75,000 words.

Text generation and summarization – Like ChatGPT, Claude can be prompted to generate responses to questions or generate summarizations of pieces of text.

Key reading on Claude

Introducing Claude

Meet Claude 2: Enhanced ChatGPT Rival from Google-backed Anthropic

Google Invests $300M in AI Startup Founded by OpenAI Alumni

Measuring Progress on Scalable Oversight for Large Language Models

Access Claude API - https://www.anthropic.com/product

Try Claude in Slack - https://www.anthropic.com/claude-in-slack

Access the Claude 2 beta - https://claude.ai/login

Stable Diffusion XL

Developer: Stability AI

Parameters: Base model: 3.5 billion parameters, Model ensemble pipeline: 6.6 billion parameters

Stable Diffusion XL is the latest iteration of the text-to-image model that arose to fame in 2022. At the time of writing, 0.9 is the most up-to-date version, which can generate hyper-realistic images.

SDXL 0.9 also boasts image-to-image capabilities, meaning users can use an image as a prompt to generate another image. Stable Diffusion XL also allows for inpainting, where it can fill in missing or damaged parts of an image, and outpainting, which extends an existing image.

Use cases

Image generation – Same as the original Stable Diffusion, the XL version can be used to generate images from natural language prompts. The latest version, however, utilizes two models, the second of which is designed to add finer details to the generated outputs as part of a two-stage process.

Reimagine - Using Stability’s Clipdrop platform, Stable Diffusion XL can be used to create multiple variations from a single image. Simply click, paste or upload an image to generate possible ways of altering images for website illustrations or concept art.

Film and TV – Stability claims SDXL generations could be used in television, music, and instructional videos, as well as “offering advancements for design and industrial use.”

Key reading on Stable Diffusion XL

Stable Diffusion's New Version Adds Image-to-Image Generation

Stable Diffusion XL

Stability AI launches SDXL 0.9: A Leap Forward in AI Image Generation

Access Stable Diffusion XL 0.9 - https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9

Dolly/ Dolly 2.0

Developer: Databricks

Parameters: Dolly: six billion parameters, Dolly 2.0: 12 billion parameters

Named after Dolly the sheep, the world’s first cloned mammal, the Dolly AI models from Databricks are designed to be small and less costly to train compared to other models on this list.

Dolly, first showcased in March, cost just $30 to train. It’s a fine-tuned version of EleutherAI’s GPT-J language model. Dolly is designed to be highly customizable, with users able to create their own ChatGPT-like chatbots using internal data.

Dolly 2.0 came a month later and was built using EleutherAI’s Pythia model family. The later iteration was fine-tuned on an instruction-following dataset crowdsourced among Databricks employees. It’s designed for both research and commercial use. Databricks did not say how much it cost to train Dolly 2.0, however.

Use cases

Text generation and document summarization – Like ChatGPT and other models on this list, either versions of Dolly can produce text when prompted using natural language. Its advantage over others comes from its customizability, with enterprises able to use the easily accessible code to build their own versions.

Key reading on Dolly

Hello Dolly: A Cheap, Customizable ChatGPT ‘Clone’

Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM

Databricks Launches Dolly 2.0: An Open Source LLM for Commercial Use

Access the Dolly code - https://github.com/databrickslabs/dolly

Access the Dolly 2.0 code - https://huggingface.co/databricks/dolly-v2-12b

XGen-7B

Developer: Salesforce

Parameters: Seven billion parameters

XGen-7B is a family of large language models designed to sift through lengthy documents to extract data insights.

Salesforce researchers took a series of seven billion parameter models and trained them on Salesforce’s in-house library, JaxFormer, as well as public-domain instructional data. The resulting models can handle 8,000 sequence lengths for up to 1.5 trillion tokens.

Use cases

Data analysis – Models like Meta’s LLaMA only have a maximum sequence length of around 2,000 tokens – meaning it would struggle with extracting insights from lengthy unstructured data sources like a document. XGen-7B can, however, sift through lengthy documents with ease, extracting insights when prompted.

Code generation – The XGen-7B model utilizes Starcoder, the code-generation model created by Salesforce and Hugging Face. Starcoder’s abilities were added to support XGen’s code-generation tasks.

Chatbot conversation capabilities – When applications like ChatGPT and Bing’s AI chat began to first appear, the longer that users conversed with the applications, the more the underlying model became confused as it was unable to handle long context lengths.

XGen could potentially be applied to chatbots to understand longer inputs could be a huge benefit for businesses. Salesforce’s researchers claim that a large context “allows a pre-trained LLM to look at customer data and respond to useful information-seeking queries.”

Key reading on XGen-7B

Salesforce's New AI Models Could Improve Data Analysis

Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length

Access the XGen-7B codebase: https://github.com/salesforce/xGen

Access the XGen-7B model checkpoint: https://huggingface.co/Salesforce/xgen-7b-8k-base

Vicuna

Developer: LMSYS Org

Parameters: 7 billion, 13 billion

Vicuna is an open source chatbot and the second model on this list to be a fine-tuned LLaMA model. To fine-tune it, the team behind Vicuna used user-shared conversations collected from ShareGPT.

It cost LMSYS Org just $300 to train the model. Its researchers claim that Vicuna achieves more than 90% of the quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca. It’s important to note that OpenAI hasn’t published anything on GPT-4, which now powers part of ChatGPT, so it’s difficult to ascertain those findings.

Use cases

Text generation, assistance – Like most models on this list, Vicuna can be applied to generate text and even act as a way to power a virtual assistant, with users able to prompt the bot using natural language.

Key reading on Vicuna

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Demo: Vicuna - a chat assistant fine-tuned from LLaMA on user-shared conversations

Access the Vicuna code - https://huggingface.co/lmsys/vicuna-13b-delta-v1.1

Inflection-1

Developer: Inflection AI

Parameters: Unknown

Inflection-1 is the model developed by the AI research lab Inflection to power Pi.ai, its virtual assistant application.

Inflection used “thousands” of H100 GPUs from Nvidia to train the model. The startup applied proprietary technical methods to power the model to perform at par with the likes of Chinchilla from DeepMind and Google's PaLM-540B.

Inflection kept its language model work entirely in-house, from data ingestion to model design. The model will, however, be available via Inflection’s conversational API soon.

Use cases

Powering personal assistants – Inflection-1 started as a way to power Pi.ai. The resulting application is intended to come across as “empathetic, useful, and safe,” according to the team behind it. Inflection-1 can also be used to generate code from a natural language description and generate answers to math questions.

Key reading on Inflection-1

Inflection-1: Pi’s Best-in-Class LLM

Inflection's Mammoth $1.5B Funding Set to Rattle OpenAI's Throne

Inflection-1 technical memo

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

12 Language Models You Need to Know

At a Glance

ChatGPT

LLaMA

I-JEPA

PaLM 2

Auto-GPT

Gorilla

Claude

Stable Diffusion XL

Dolly/ Dolly 2.0

XGen-7B

Vicuna

Inflection-1

About the Author(s)

Latest News

Trending articles