Meet TinyLlama: The 550MB AI Model Trained on 3 Trillion Tokens

Early training results show promise for this minuscule but mighty model from Singapore University of Technology and Design

Ben Wodecki, Jr. Editor

September 7, 2023

2 Min Read

Made using Stable Diffusion XL (prompt: A tiny llama in a field)

At a Glance

Researchers are sprinting to train a miniature 550MB Llama model on a huge three trillion token dataset in just 90 days.
Dubbed TinyLlama, this compact model aims to bring performant AI to memory-constrained edge devices.

Developers are increasingly demanding smaller AI models because fewer parameters are more optimal for edge devices, with their restricted memory and computational capacities. Smaller models can also be used to help decode larger models, according to former Tesla senior director of AI Andrej Karpathy.

Now, an ambitious project is looking to create a minuscule pre-trained model - except it is trained on trillions of tokens.

The TinyLlama project, led by a research assistant at Singapore University of Technology and Design, is trying to pre-train a 1.1 billion Llama model on a whopping three trillion tokens.

This model takes up only 550MB of RAM. The team behind it believes that its compactness would allow it to cater to a multitude of applications demanding a restricted computation and memory footprint to enable capabilities such as real time machine translation without an internet connection.

Training on the model kicked off on Sept. 1 using 16 A100-40G GPUs – and the team is trying to complete it in just 90 days.

At the time of writing, the team is 105 billion tokens down. You can monitor the training process and cross-entropy loss here.

The model builders said they are using “exactly the same architecture and tokenizer” that Meta used to train Llama 2, so it can be plugged and played in open source projects built on Llama.

The three-trillion token dataset the TinyLlama team is using is made up of a mix of Slimpajama from Cerebras Systems and Starcoderdata – the dataset used for training StarCoder, the code generation model.

Once completed, TinyLlama would join the growing ranks of smaller language models that developers use to build applications. Also making headway are Pythia-1b from EleutherAI and MPT-1b from the Databricks-owned MosaicML.

Stay updated. Subscribe to the AI Business newsletter.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Meet TinyLlama: The 550MB AI Model Trained on 3 Trillion Tokens

At a Glance

Stay updated. Subscribe to the AI Business newsletter.

About the Author(s)

Latest News

Trending articles