Is This a Breakthrough in AI Model Training?

Researchers from Fast.ai discover that large language models can learn from limited inputs

Ben Wodecki, Jr. Editor

September 20, 2023

2 Min Read
Graphic of a computer hand touching an orb etched with the word, AI
Getty Images

At a Glance

  • Researchers make intriguing discovery- AI models can rapidly memorize concepts from just one or two examples.
  • Such a concept defies LLM assumptions and could slash training costs through greatly improved efficiency.

Large language models take an age to train – and can be a very costly endeavor. However, researchers from Fast.ai may have discovered a way for models to rapidly memorize examples from very few exposures.

In a technical paper published on the company’s website, the team at Fast.ai found that large language models can remember inputs after seeing them just once.

The team was fine-tuning a large language model on multiple-choice science exam questions and found the model was able to rapidly memorize examples from the dataset after initial exposure to them.

Upon recreating the experiment, the team at Fast.ai was able to back up the theory – potentially necessitating new thinking around model training.

“It’s early days, but the experiments support the hypothesis that the models are able to rapidly remember inputs. This might mean we have to re-think how we train and use large language models,” the Fast.ai team wrote.

How does this work?

Jeremy Howard, the co-founder of Fast.ai, was working with colleague Jonathan Whitaker on a large language model for the Kaggle Science Exam competition. They were training models using a dataset compiled by Radek Osmulski, a senior data scientist at Nvidia.

After three rounds of fine-tuning, they noticed an “unusual” training loss curve - the graphs that show how error rates change during training.

Related:Meta Developing AI Model on Par with ChatGPT That's Free to Use

In an explainer thread on X (Twitter), Howard said the pair had noticed similar loss curves before but had always assumed it was due to a bug.

View post on Twitter

After examining the code – no bug was discovered. Instead, the team at Fast.ai sought other examples of this phenomenon and found “lots of examples of similar training curves.”

Upon re-conducting the tests, the team at fast achieved similar loss curves which co-founder Howard contended “can only be explained by nearly complete memorization occurring from a single example.”

The team at Fast.ai argue that there is “no fundamental law that says that neural networks cannot learn to recognize inputs from a single example. It is just what researchers and practitioners have generally found to be the case in practice.”

The findings could imply that standard practices around training neural networks over many epochs with extensive data augmentation may be unnecessary for large language models.

Instead, the team at Fast.ai propose that models learn better from fewer, more concise training examples – which could allow models to be trained cheaper and faster from using significantly less compute.

Stay updated. Subscribe to the AI Business newsletter.

Related:Falcon 180B: The Powerful Open Source AI Model … That Lacks Guardrails

About the Author

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like