Nvidia, Intel develop memory-optimizing deep learning training standardNvidia, Intel develop memory-optimizing deep learning training standard
Paper: FP8 can deliver training accuracy similar to 16-bit standards
September 20, 2022

Paper: FP8 can deliver training accuracy similar to 16-bit standards
Nvidia, Intel and Arm have joined forces to create a new standard designed to optimize memory usage in deep learning applications.
The 8-bit floating point (FP8) standard was developed across several neural network architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs) and Transformer-based models.
The standard is also applicable to language models up to 175 billion parameters, which would cover the likes of GPT-3, OPT-175B and Bloom.
“By adopting an interchangeable format that maintains accuracy, AI models will operate consistently and performantly across all hardware platforms, and help advance the state of the art of AI,” Nvidia’s Shar Narasimhan wrote in a blog post.
Optimizing AI memory usage
When building an AI system, developers need to consider the weight of the system, which governs the effectiveness of what a system learns from its training data.
There are several standards used currently, including FP32 and FP16, but these often reduce the volume of memory required to train a system in place of accuracy.
Their new approach focuses on bits compared with prior methods, so as to use memory capabilities more efficiently; less memory being used by a system means less computational power is needed to run an application.
The trio outlined the new standard in a paper, which covers training and inference evaluation using the standard across a variety of tasks and models.
According to the paper, FP8 achieved “comparable accuracy” to FP16 format across use cases and applications including computer vision.
Results on transformers and GAN networks, like OpenAI’s DALL-E, saw FP8 achieve training accuracy similar to 16-bit precisions while delivering “significant speedups.”
Testing using the MLPerf Inference benchmark, Nvidia Hopper using FP8 achieved 4.5x faster times using the BERT model for natural language processing.
“Using FP8 not only accelerates and reduces resources required to train but also simplifies 8-bit inference deployment by using the same datatypes for training and inference,” according to the paper.
About the Author(s)
You May Also Like