September 7, 2023
At a Glance
- Researchers are sprinting to train a miniature 550MB Llama model on a huge three trillion token dataset in just 90 days.
- Dubbed TinyLlama, this compact model aims to bring performant AI to memory-constrained edge devices.
Developers are increasingly demanding smaller AI models because fewer parameters are more optimal for edge devices, with their restricted memory and computational capacities. Smaller models can also be used to help decode larger models, according to former Tesla senior director of AI Andrej Karpathy.
Now, an ambitious project is looking to create a minuscule pre-trained model - except it is trained on trillions of tokens.
The TinyLlama project, led by a research assistant at Singapore University of Technology and Design, is trying to pre-train a 1.1 billion Llama model on a whopping three trillion tokens.
This model takes up only 550MB of RAM. The team behind it believes that its compactness would allow it to cater to a multitude of applications demanding a restricted computation and memory footprint to enable capabilities such as real time machine translation without an internet connection.
Training on the model kicked off on Sept. 1 using 16 A100-40G GPUs – and the team is trying to complete it in just 90 days.
At the time of writing, the team is 105 billion tokens down. You can monitor the training process and cross-entropy loss here.
The model builders said they are using “exactly the same architecture and tokenizer” that Meta used to train Llama 2, so it can be plugged and played in open source projects built on Llama.
The three-trillion token dataset the TinyLlama team is using is made up of a mix of Slimpajama from Cerebras Systems and Starcoderdata – the dataset used for training StarCoder, the code generation model.
Once completed, TinyLlama would join the growing ranks of smaller language models that developers use to build applications. Also making headway are Pythia-1b from EleutherAI and MPT-1b from the Databricks-owned MosaicML.
Stay updated. Subscribe to the AI Business newsletter.
Read more about:ChatGPT / Generative AI
About the Author(s)
You May Also Like