Sponsored by Google Cloud
Choosing Your First Generative AI Use Cases
To get started with generative AI, first focus on areas that can improve human experiences with information.
Early training results show promise for this minuscule but mighty model from Singapore University of Technology and Design
Developers are increasingly demanding smaller AI models because fewer parameters are more optimal for edge devices, with their restricted memory and computational capacities. Smaller models can also be used to help decode larger models, according to former Tesla senior director of AI Andrej Karpathy.
Now, an ambitious project is looking to create a minuscule pre-trained model - except it is trained on trillions of tokens.
The TinyLlama project, led by a research assistant at Singapore University of Technology and Design, is trying to pre-train a 1.1 billion Llama model on a whopping three trillion tokens.
This model takes up only 550MB of RAM. The team behind it believes that its compactness would allow it to cater to a multitude of applications demanding a restricted computation and memory footprint to enable capabilities such as real time machine translation without an internet connection.
Training on the model kicked off on Sept. 1 using 16 A100-40G GPUs – and the team is trying to complete it in just 90 days.
At the time of writing, the team is 105 billion tokens down. You can monitor the training process and cross-entropy loss here.
The model builders said they are using “exactly the same architecture and tokenizer” that Meta used to train Llama 2, so it can be plugged and played in open source projects built on Llama.
The three-trillion token dataset the TinyLlama team is using is made up of a mix of Slimpajama from Cerebras Systems and Starcoderdata – the dataset used for training StarCoder, the code generation model.
Once completed, TinyLlama would join the growing ranks of smaller language models that developers use to build applications. Also making headway are Pythia-1b from EleutherAI and MPT-1b from the Databricks-owned MosaicML.
Read more about:
ChatGPT / Generative AIYou May Also Like