Meet TinyLlama: The 550MB AI Model Trained on 3 Trillion Tokens

Early training results show promise for this minuscule but mighty model from Singapore University of Technology and Design

Ben Wodecki, Jr. Editor

September 7, 2023

2 Min Read
A tiny llama in a field
Made using Stable Diffusion XL (prompt: A tiny llama in a field)

At a Glance

  • Researchers are sprinting to train a miniature 550MB Llama model on a huge three trillion token dataset in just 90 days.
  • Dubbed TinyLlama, this compact model aims to bring performant AI to memory-constrained edge devices.

Developers are increasingly demanding smaller AI models because fewer parameters are more optimal for edge devices, with their restricted memory and computational capacities. Smaller models can also be used to help decode larger models, according to former Tesla senior director of AI Andrej Karpathy.

Now, an ambitious project is looking to create a minuscule pre-trained model - except it is trained on trillions of tokens.

The TinyLlama project, led by a research assistant at Singapore University of Technology and Design, is trying to pre-train a 1.1 billion Llama model on a whopping three trillion tokens.

This model takes up only 550MB of RAM. The team behind it believes that its compactness would allow it to cater to a multitude of applications demanding a restricted computation and memory footprint to enable capabilities such as real time machine translation without an internet connection.

Training on the model kicked off on Sept. 1 using 16 A100-40G GPUs – and the team is trying to complete it in just 90 days.

At the time of writing, the team is 105 billion tokens down. You can monitor the training process and cross-entropy loss here.

The model builders said they are using “exactly the same architecture and tokenizer” that Meta used to train Llama 2, so it can be plugged and played in open source projects built on Llama.

Related:12 Language Models You Need to Know

The three-trillion token dataset the TinyLlama team is using is made up of a mix of Slimpajama from Cerebras Systems and Starcoderdata – the dataset used for training StarCoder, the code generation model.

Once completed, TinyLlama would join the growing ranks of smaller language models that developers use to build applications. Also making headway are Pythia-1b from EleutherAI and MPT-1b from the Databricks-owned MosaicML.

Stay updated. Subscribe to the AI Business newsletter.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like