TinyLlama: The Mini AI Model with a Trillion-Token Punch
Weighing under 640 megabytes, TinyLlama model was trained on a trillion tokens and outperforms similar-sized rivals
At a Glance
- The new TinyLlama has launched after months of training.
- It brings the capabilities of models trained on trillions of tokens into a mobile-friendly package suitable for edge devices.
It is finally here: The hotly anticipated open source model TinyLlama has dropped, and it is compact yet powerful.
The TinyLlama project kicked off last September, with a group of developers attempting to train a minuscule model on trillions of tokens. After much work and a few setbacks, the TinyLlama team has now released the model. It is a billion parameters in size and was trained on around one trillion tokens for approximately three epochs, or cycles through the training data.
According to the paper outlining the model, the finished TinyLlama outperforms existing open source language models of comparable sizes, including Pythia-1.4B, OPT-1.3B and MPT-1.3B.
Potential use cases for TinyLlama could see the model deployed on edge devices – since it only takes up 637 MB. The model could even be used to assist speculative decoding of larger models, with the team behind it referencing a tutorial by former Tesla senior director of AI Andrej Karpathy, who is now at OpenAI.
The official TinyLlama. Credit: TinyLlama Project team
The model itself is designed to be a compact version of Llama 2, Meta’s open source language model, even boasting the same architecture and tokenizer, meaning it can be plugged and played in projects built upon Llama.
Despite its small stature, TinyLlama can be used for downstream tasks, with the team behind it touting it as “an attractive platform for researchers and practitioners in language model research.”
For example, Apple machine learning research scientist Awni Hannun fine-tuned TinyLlama with LoRA locally using just an 8GB Mac Mini via MLX, Apple’s open source training tool suite.
“With its compact architecture and promising performance, TinyLlama can enable end-user applications on mobile devices, and serve as a lightweight platform for testing a wide range of innovative ideas related to language models,” the team behind the mini model said.
And more TinyLlama is on the way – with the developers planning to develop “improved versions” including expanding its performance and versatility across various tasks.
Access TinyLlama
You can download TinyLlama for free via GitHub. All model checkpoints are also available. TinyLlama is suitable for commercial use as per its Apache-2.0 license.
The team behind the model recommend using the fine-tuned chat version of TinyLlama at present as the learning rate of the base model “has not cooled down yet.”
Smaller models on the rise
A recent wave of smaller AI models has begun to emerge, with companies looking to cut hardware running costs.
Microsoft, for example, has its Phi project, working on diminutive models a few billion parameters in size but capable of beating the bigger boys. Phi-2, launched last December, outperformed models by up to 25 times its size.
Set to release soon is Gemini Nano, the newly announced small version of Google's new flagship foundation model, which will stand at around 3.2 billion parameters in size when it drops later this year.
According to Bradley Shimmin, chief analyst, AI and data analytics at sister research firm Omdia, these smaller models perform well as they are trained on synthetic data generated by larger models.
“Synthetic data is already driving a great deal of innovation that we are seeing coming out of the generative AI space itself, wherein you have so many of these smaller models that are right now wowing people through their capabilities that match those of frontier models like OpenAI’s GPT.”
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like