July 12, 2023
At a Glance
- AI Business breaks down essential language models you need to know – from ChatGPT and Claude to I-JEPA and Inflection-1
Last July, AI Business published a list of 7 language models you need to know. At the time, there was no ChatGPT, PaLM and or LLaMA.
Twelve months on, AI has entered mainstream consciousness as consumers and enterprises race to use these newfound AI tools and models.
AI Business has compiled a new list looking at some of the most important models shaking up the AI space.
ChatGPT is the application that truly kick-started the mainstream public’s fascination with AI.
Released in November 2022, ChatGPT is an interface application that allows users to ask it questions and generate responses.
ChatGPT has gone on to act as the basis for a series of Microsoft products after the software giant invested in OpenAI to gain access to the application.
ChatGPT is a closed system, meaning OpenAI keeps full control and ownership of the application. OpenAI has kept the parameter levels of GPT-4 private.
Text generation and summarization – ChatGPT can generate human-like text via detailed responses. As well as answering a variety of questions, it also has effective summarization abilities, able to parse through extensive pieces of information to deliver concise, understandable summaries, making it a powerful tool for distilling complex content into more manageable forms.
Code generation - ChatGPT can generate code across multiple programming languages, offering solutions to coding problems, and helping in debugging and demonstrating coding practices. Its results, while generally reliable, should still be reviewed for accuracy and optimality. ChatGPT is the most popular AI tool among developers, according to a Stack Overflow survey.
Operating robots – A team of engineers at Microsoft showcased the possibility of allowing ChatGPT to control a robot. A demo saw OpenAI’s language model hooked up to a robotic arm and tasked to solve some puzzles. ChatGPT asked the researchers clarification questions when the user’s instructions were ambiguous. It even wrote code structures for a drone it was controlling in another experiment.
Summarizing bills for lawmakers – Congressional staff is using ChatGPT. Some 40 licenses to ChatGPT+, the $20 premium version that uses GPT-4, were awarded to Congressional staff. No word on what the licenses to ChatGPT+ were to be used for, though reports surfaced that staff were using the tool for creating and summarizing content.
Key reading on ChatGPT
Parameters: From 7 to 65 billion
LLaMA – which stands for Large Language Model Meta AI, is designed for researchers and developers to make models. LLaMA is an open source model designed to be smaller than the likes of GPT-3. It’s designed for users who lack the computing power to develop language models.
Since its release in late February 2023, LLaMA has been routinely fine-tuned by researchers to create other language models, such as Vicuna.
Open source model development – LLaMA has formed the underlying basis for a variety of open source AI models, including Dolly, Alpaca and Gorilla to name a few. LLaMA is made to be tinkered with, as researchers and developers alike flocked to the AI model. The seven billion parameter version of LLaMA has proven incredibly popular as its size means it requires less computing power to run.
Key reading on LLaMA
Access LLaMA code - https://github.com/facebookresearch/llama
Developers: Meta, FAIR
I-JEPA is an AI model published by Meta in June 2023. The model itself is not the star, but rather how it was built: Using a new architecture.
The JEPA approach can predict missing information akin to a human’s ability for general understanding, something the generative AI method cannot do.
Meta’s chief AI scientist Yann LeCun has continuously proposed the idea that deep learning AI models can learn about their surroundings without the need for human intervention. The JEPA approach aligns with that vision and also doesn’t involve any overhead associated with applying more computationally intensive data augmentations to produce multiple views.
Self-Supervised Learning from Images – I-JEPA (Image Joint Embedding Predictive Architecture) creates an internal model of a subject and compares abstract representations of images rather than comparing individual pixels themselves.
Effectively, I-JEPA learns and applies that information to a variety of applications without needing extensive fine-tuning.
Key reading on I-JEPA
Access I-JEPA’s model code and checkpoints - https://github.com/facebookresearch/ijepa
Parameters: 340 billion (reported)
PaLM 2 is Google’s flagship language model. Unveiled at the company’s annual I/O conference, the model supports over 100 languages and is designed to be fine-tuned for domain-specific applications.
PaLM comes in a variety of sizes – each of which is named after an animal to represent its size. Gecko is the smallest, and then there's Otter, Bison, and up to Unicorn, the largest.
Chatbot improvement – PaLM now powers Bard, Google’s answer to ChatGPT. PaLM powers Bard to generate text and code as well as summarize documents.
Audio generation and speech processing – PaLM 2, when combined with an audio generation model, can be used to generate text and speech for speech recognition and speech-to-speech translation. When combined with AudioLM, PaLM 2 can leverage larger quantities of text training data to assist with speech tasks. Google contends that adding a text-only large language model to an audio-generative system improves speech processing and outperforms existing systems for speech translation tasks.
Health care applications – Among examples of PaLM 2 being fine-tuned and applied to a specific sector, Med-PaLM-2 showcases the model’s versatility. Users can prompt the model to determine medical issues with images, like X-rays. According to Google researchers, Med-PaLM-2 achieved a nine-times reduction in inaccurate reasoning - approaching the performance of clinicians to answer the same set of questions.
Key reading on PaLM 2
Developers: Auto-GPT development team
Short for Autonomous GPT, Auto-GPT is an open source project attempting to provide internet users access to a powerful language model. Auto-GPT is built off OpenAI’s GPT-4 and can be used to automate social media accounts or generate text, among other use cases.
The model grew popular online following its April 2023 launch, with the likes of former Tesla AI chief Andrej Karpathy among those praising the model’s abilities.
Automating Twitter accounts – Despite Elon Musk’s attempts to take down bot accounts, Auto-GPT could be used to power Twitter profiles. Auto-GPT is used to power the IndiepreneurGPT account and automatically tweets from it.
Automating general processes – Auto-GPT is designed for experimentation. So far, developers have used the model to do such things as autonomously order pizza, plan trips, or book flights. The team behind it warns, however, that it's not “polished” and “may not perform well in complex, real-world business scenarios.”
Key reading on Auto-GPT
Access Auto-GPT - https://github.com/Significant-Gravitas/Auto-GPT
Developers: UC Berkeley, Microsoft Research
Parameters: Seven billion parameters
The first AI model on this list to utilize Meta’s LLaMA as its body, Gorilla was fine-tuned to improve its ability to make API calls – or more simply, work with external tools. The end-to-end model is designed to serve API calls without requiring any additional coding and can be integrated with other tools.
Gorilla can be used commercially in tandem with Apache 2.0 licensed LLM models.
Virtual assistants – By utilizing APIs, Gorilla can be applied to many applications. For example, by accessing APIs for calendars, Gorilla could be used to power virtual assistant applications. The model could, when queried, return the current date without taking any input, for example.
Search improvements – Using natural language prompts in a search tab, Gorilla could access a search-focused API, like Wikipedia search, to return short text snippets or have an improved understanding of tasks. For example, instead of listing all files under a certain name, it would list the most recent file relevant to the context.
Key reading on Gorilla
Try Gorilla via Colab - https://colab.research.google.com/drive/1DEBPsccVLF_aUnmD0FwPeHFrtdC0QIUP?usp=sharing
Access the Gorilla code - https://github.com/ShishirPatil/gorilla
Parameters: Unknown (although Anthropic’s Constitutional AI paper refers to one AnthropicLM v4-s3, which boasts 52 billion parameters)
Think of Claude as ChatGPT’s sensible cousin. Anthropic was founded by former OpenAI staff who left over disagreements about close ties with Microsoft.
Anthropic went on to develop Claude, a chatbot application not too dissimilar to ChatGPT apart from one thing – increased focus on safety.
Claude uses constitutional AI, a method developed by Anthropic to prevent it from generating potentially harmful outputs. The model is given a set of principles to abide by, almost like giving it a form of 'conscience.'
At the time of writing, Claude 2 is the latest version. Unveiled in July 2023, Claude 2 boasts improved performance, capable of acting as business and technical analysts.
Document analysis – Claude can be used to obtain insights from multiple lengthy documents or even books. Users then ask Claude questions about documents. This feature comes from Claude’s sizable context window, a range of tokens that the AI considers before generating an output. Claude’s context window spans 100,000 tokens of text – or around 75,000 words.
Text generation and summarization – Like ChatGPT, Claude can be prompted to generate responses to questions or generate summarizations of pieces of text.
Key reading on Claude
Access Claude API - https://www.anthropic.com/product
Try Claude in Slack - https://www.anthropic.com/claude-in-slack
Access the Claude 2 beta - https://claude.ai/login
Stable Diffusion XL
Developer: Stability AI
Parameters: Base model: 3.5 billion parameters, Model ensemble pipeline: 6.6 billion parameters
Stable Diffusion XL is the latest iteration of the text-to-image model that arose to fame in 2022. At the time of writing, 0.9 is the most up-to-date version, which can generate hyper-realistic images.
SDXL 0.9 also boasts image-to-image capabilities, meaning users can use an image as a prompt to generate another image. Stable Diffusion XL also allows for inpainting, where it can fill in missing or damaged parts of an image, and outpainting, which extends an existing image.
Image generation – Same as the original Stable Diffusion, the XL version can be used to generate images from natural language prompts. The latest version, however, utilizes two models, the second of which is designed to add finer details to the generated outputs as part of a two-stage process.
Reimagine - Using Stability’s Clipdrop platform, Stable Diffusion XL can be used to create multiple variations from a single image. Simply click, paste or upload an image to generate possible ways of altering images for website illustrations or concept art.
Film and TV – Stability claims SDXL generations could be used in television, music, and instructional videos, as well as “offering advancements for design and industrial use.”
Key reading on Stable Diffusion XL
Access Stable Diffusion XL 0.9 - https://huggingface.co/stabilityai/stable-diffusion-xl-base-0.9
Dolly/ Dolly 2.0
Parameters: Dolly: six billion parameters, Dolly 2.0: 12 billion parameters
Named after Dolly the sheep, the world’s first cloned mammal, the Dolly AI models from Databricks are designed to be small and less costly to train compared to other models on this list.
Dolly, first showcased in March, cost just $30 to train. It’s a fine-tuned version of EleutherAI’s GPT-J language model. Dolly is designed to be highly customizable, with users able to create their own ChatGPT-like chatbots using internal data.
Dolly 2.0 came a month later and was built using EleutherAI’s Pythia model family. The later iteration was fine-tuned on an instruction-following dataset crowdsourced among Databricks employees. It’s designed for both research and commercial use. Databricks did not say how much it cost to train Dolly 2.0, however.
Text generation and document summarization – Like ChatGPT and other models on this list, either versions of Dolly can produce text when prompted using natural language. Its advantage over others comes from its customizability, with enterprises able to use the easily accessible code to build their own versions.
Key reading on Dolly
Access the Dolly code - https://github.com/databrickslabs/dolly
Access the Dolly 2.0 code - https://huggingface.co/databricks/dolly-v2-12b
Parameters: Seven billion parameters
XGen-7B is a family of large language models designed to sift through lengthy documents to extract data insights.
Salesforce researchers took a series of seven billion parameter models and trained them on Salesforce’s in-house library, JaxFormer, as well as public-domain instructional data. The resulting models can handle 8,000 sequence lengths for up to 1.5 trillion tokens.
Data analysis – Models like Meta’s LLaMA only have a maximum sequence length of around 2,000 tokens – meaning it would struggle with extracting insights from lengthy unstructured data sources like a document. XGen-7B can, however, sift through lengthy documents with ease, extracting insights when prompted.
Code generation – The XGen-7B model utilizes Starcoder, the code-generation model created by Salesforce and Hugging Face. Starcoder’s abilities were added to support XGen’s code-generation tasks.
Chatbot conversation capabilities – When applications like ChatGPT and Bing’s AI chat began to first appear, the longer that users conversed with the applications, the more the underlying model became confused as it was unable to handle long context lengths.
XGen could potentially be applied to chatbots to understand longer inputs could be a huge benefit for businesses. Salesforce’s researchers claim that a large context “allows a pre-trained LLM to look at customer data and respond to useful information-seeking queries.”
Key reading on XGen-7B
Access the XGen-7B codebase: https://github.com/salesforce/xGen
Access the XGen-7B model checkpoint: https://huggingface.co/Salesforce/xgen-7b-8k-base
Developer: LMSYS Org
Parameters: 7 billion, 13 billion
It cost LMSYS Org just $300 to train the model. Its researchers claim that Vicuna achieves more than 90% of the quality of OpenAI ChatGPT and Google Bard while outperforming other models like LLaMA and Stanford Alpaca. It’s important to note that OpenAI hasn’t published anything on GPT-4, which now powers part of ChatGPT, so it’s difficult to ascertain those findings.
Text generation, assistance – Like most models on this list, Vicuna can be applied to generate text and even act as a way to power a virtual assistant, with users able to prompt the bot using natural language.
Key reading on Vicuna
Access the Vicuna code - https://huggingface.co/lmsys/vicuna-13b-delta-v1.1
Developer: Inflection AI
Inflection used “thousands” of H100 GPUs from Nvidia to train the model. The startup applied proprietary technical methods to power the model to perform at par with the likes of Chinchilla from DeepMind and Google's PaLM-540B.
Inflection kept its language model work entirely in-house, from data ingestion to model design. The model will, however, be available via Inflection’s conversational API soon.
Powering personal assistants – Inflection-1 started as a way to power Pi.ai. The resulting application is intended to come across as “empathetic, useful, and safe,” according to the team behind it. Inflection-1 can also be used to generate code from a natural language description and generate answers to math questions.
Key reading on Inflection-1
Read more about:ChatGPT / Generative AI
About the Author(s)
You May Also Like