September 20, 2023
At a Glance
- Researchers make intriguing discovery- AI models can rapidly memorize concepts from just one or two examples.
- Such a concept defies LLM assumptions and could slash training costs through greatly improved efficiency.
Large language models take an age to train – and can be a very costly endeavor. However, researchers from Fast.ai may have discovered a way for models to rapidly memorize examples from very few exposures.
In a technical paper published on the company’s website, the team at Fast.ai found that large language models can remember inputs after seeing them just once.
The team was fine-tuning a large language model on multiple-choice science exam questions and found the model was able to rapidly memorize examples from the dataset after initial exposure to them.
Upon recreating the experiment, the team at Fast.ai was able to back up the theory – potentially necessitating new thinking around model training.
“It’s early days, but the experiments support the hypothesis that the models are able to rapidly remember inputs. This might mean we have to re-think how we train and use large language models,” the Fast.ai team wrote.
How does this work?
Jeremy Howard, the co-founder of Fast.ai, was working with colleague Jonathan Whitaker on a large language model for the Kaggle Science Exam competition. They were training models using a dataset compiled by Radek Osmulski, a senior data scientist at Nvidia.
After three rounds of fine-tuning, they noticed an “unusual” training loss curve - the graphs that show how error rates change during training.
In an explainer thread on X (Twitter), Howard said the pair had noticed similar loss curves before but had always assumed it was due to a bug.
After examining the code – no bug was discovered. Instead, the team at Fast.ai sought other examples of this phenomenon and found “lots of examples of similar training curves.”
Upon re-conducting the tests, the team at fast achieved similar loss curves which co-founder Howard contended “can only be explained by nearly complete memorization occurring from a single example.”
The team at Fast.ai argue that there is “no fundamental law that says that neural networks cannot learn to recognize inputs from a single example. It is just what researchers and practitioners have generally found to be the case in practice.”
The findings could imply that standard practices around training neural networks over many epochs with extensive data augmentation may be unnecessary for large language models.
Instead, the team at Fast.ai propose that models learn better from fewer, more concise training examples – which could allow models to be trained cheaper and faster from using significantly less compute.
Stay updated. Subscribe to the AI Business newsletter.
About the Author(s)
You May Also Like