Paper: FP8 can deliver training accuracy similar to 16-bit standards

Ben Wodecki, Jr. Editor

September 20, 2022

2 Min Read

Paper: FP8 can deliver training accuracy similar to 16-bit standards

Nvidia, Intel and Arm have joined forces to create a new standard designed to optimize memory usage in deep learning applications.

The 8-bit floating point (FP8) standard was developed across several neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs) and Transformer-based models.

The standard is also applicable to language models up to 175 billion parameters, which would cover the likes of GPT-3, OPT-175B and Bloom.

“By adopting an interchangeable format that maintains accuracy, AI models will operate consistently and performantly across all hardware platforms, and help advance the state of the art of AI,” Nvidia’s Shar Narasimhan wrote in a blog post.

Optimizing AI memory usage

When building an AI system, developers need to consider the weight of the system, which governs the effectiveness of what a system learns from its training data.

There are several standards used currently, including FP32 and FP16, but these often reduce the volume of memory required to train a system in place of accuracy.

Their new approach focuses on bits compared with prior methods, so as to use memory capabilities more efficiently; less memory being used by a system means less computational power is needed to run an application.

The trio outlined the new standard in a paper, which covers training and inference evaluation using the standard across a variety of tasks and models.

According to the paper, FP8 achieved “comparable accuracy” to FP16 format across use cases and applications including computer vision.

Results on transformers and GAN networks, like OpenAI’s DALL-E, saw FP8 achieve training accuracy similar to 16-bit precisions while delivering “significant speedups.”

Testing using the MLPerf Inference benchmark, Nvidia Hopper using FP8 achieved 4.5x faster times using the BERT model for natural language processing.

“Using FP8 not only accelerates and reduces resources required to train but also simplifies 8-bit inference deployment by using the same datatypes for training and inference,” according to the paper.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like