TinyLlama: The Mini AI Model with a Trillion-Token Punch

Weighing under 640 megabytes, TinyLlama model was trained on a trillion tokens and outperforms similar-sized rivals

Ben Wodecki, Jr. Editor

January 5, 2024

3 Min Read

Illustration of a small llama in a field for a story about the TinyLlama language model

AI Business via DALL-E 3

At a Glance

The new TinyLlama has launched after months of training.
It brings the capabilities of models trained on trillions of tokens into a mobile-friendly package suitable for edge devices.

It is finally here: The hotly anticipated open source model TinyLlama has dropped, and it is compact yet powerful.

The TinyLlama project kicked off last September, with a group of developers attempting to train a minuscule model on trillions of tokens. After much work and a few setbacks, the TinyLlama team has now released the model. It is a billion parameters in size and was trained on around one trillion tokens for approximately three epochs, or cycles through the training data.

According to the paper outlining the model, the finished TinyLlama outperforms existing open source language models of comparable sizes, including Pythia-1.4B, OPT-1.3B and MPT-1.3B.

Potential use cases for TinyLlama could see the model deployed on edge devices – since it only takes up 637 MB. The model could even be used to assist speculative decoding of larger models, with the team behind it referencing a tutorial by former Tesla senior director of AI Andrej Karpathy, who is now at OpenAI.

The official TinyLlama. Credit: TinyLlama Project team

The model itself is designed to be a compact version of Llama 2, Meta’s open source language model, even boasting the same architecture and tokenizer, meaning it can be plugged and played in projects built upon Llama.

Despite its small stature, TinyLlama can be used for downstream tasks, with the team behind it touting it as “an attractive platform for researchers and practitioners in language model research.”

For example, Apple machine learning research scientist Awni Hannun fine-tuned TinyLlama with LoRA locally using just an 8GB Mac Mini via MLX, Apple’s open source training tool suite.

View post on X

“With its compact architecture and promising performance, TinyLlama can enable end-user applications on mobile devices, and serve as a lightweight platform for testing a wide range of innovative ideas related to language models,” the team behind the mini model said.

And more TinyLlama is on the way – with the developers planning to develop “improved versions” including expanding its performance and versatility across various tasks.

Access TinyLlama

You can download TinyLlama for free via GitHub. All model checkpoints are also available. TinyLlama is suitable for commercial use as per its Apache-2.0 license.

The team behind the model recommend using the fine-tuned chat version of TinyLlama at present as the learning rate of the base model “has not cooled down yet.”

Smaller models on the rise

A recent wave of smaller AI models has begun to emerge, with companies looking to cut hardware running costs.

Microsoft, for example, has its Phi project, working on diminutive models a few billion parameters in size but capable of beating the bigger boys. Phi-2, launched last December, outperformed models by up to 25 times its size.

Set to release soon is Gemini Nano, the newly announced small version of Google's new flagship foundation model, which will stand at around 3.2 billion parameters in size when it drops later this year.

According to Bradley Shimmin, chief analyst, AI and data analytics at sister research firm Omdia, these smaller models perform well as they are trained on synthetic data generated by larger models.

“Synthetic data is already driving a great deal of innovation that we are seeing coming out of the generative AI space itself, wherein you have so many of these smaller models that are right now wowing people through their capabilities that match those of frontier models like OpenAI’s GPT.”

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

TinyLlama: The Mini AI Model with a Trillion-Token Punch

At a Glance

Access TinyLlama

Smaller models on the rise

About the Author(s)

Latest News

Trending articles