MLCommons on Storage Benchmarks for AI/ML Applications

An opinion piece by the storage working group co-chair at MLCommons, a consortium founded by Big Tech, and software architect at Panasas

Curtis Anderson

October 5, 2022

4 Min Read

We often say that the world runs on data, and nowhere is that more true than in artificial intelligence and machine learning (AI/ML), where data is quite literally the only thing algorithms use to tell the difference between a picture of a dog from a bear, or a car from a submarine.

Until the emergence of modern graphic processing units (GPUs) with extreme computational power, the neural networks and the AI/ML applications in use now were nothing more than an abstract theory. But today, organizations of all shapes and sizes are exploring AI/ML applications, and so the practicality of building a high-performance system capable of supporting them has become a top priority. 

Unlike other high-performance applications, data is always “hot” in these environments – there is no such thing as “cold” data in AI. The more data available, the more accurate the results. To achieve the highest accuracy possible, neural networks are continuously retrained, meaning they need to be fed more and more data.

And since GPUs and other new accelerators are typically very expensive, they need to be kept as busy as possible to achieve the best return on investment. Storage systems therefore play an indispensable role in keeping the beast fed – ensuring that the next set of data is ready and available before the accelerator has completed computing on the current set.  

However, feeding the beast is not just about getting data from A to B as quickly as possible. The key requirement is ensuring the accelerator always has data available when it’s needed. The storage system should therefore be carefully compared with the compute accelerators to ensure that the storage provides neither too much nor too little capacity and performance.  

Yet another crucial and unique consideration for data storage for AI/ML applications is that data must be delivered to the compute accelerator in a random order. Think of a neural network being trained to identify potentially cancerous skin moles. If it was fed all the cancerous images in the dataset followed by all the benign images, it would likely detect a pattern in the sequence of data rather than a pattern in the data itself.

This would result in very low accuracy conclusions when used on a new image to infer whether a particular mole may need a biopsy or not. This is why data needs to be constantly reordered when training the neural network; randomizing the sequence and then changing that randomization each time is critical to attaining accurate results.  

In addition, many organizations have multiple types of AI/ML projects, including natural language processing, image recognition, and recommendations. Each of these projects rely on a different neural network architecture that requires different data, and accesses that data in different ways. This means that one accelerator and one storage system may be better suited to a particular workload than others.  

With stretched budgets and growing data volumes, there is a lot of pressure to make the right decision when purchasing storage and building your data infrastructure. This is where benchmarks can help. By providing an objective way to measure the performance of a solution, benchmarks enable organizations to test several options and identify the one that best fits their environment and needs.  

While there are many storage benchmarks available, most of them are currently unsuitable for AI/ML applications because they do not account for the need to randomize the input stream as discussed earlier. That need is precisely what we are addressing in the storage working group of MLCommons, an open engineering consortium whose founding members include Alibaba, Facebook AI, Google, Intel, Nvidia as well as academic researchers from top universities around the world.

MLCommons produces benchmarks known as MLPerf that characterize the performance of GPUs and silicon accelerators for both the training and inference of several different classes of AI/ML applications and neural network architectures. The MLPerf benchmarks are designed to help organizations identify the number and type of accelerators they need to purchase for their particular use cases.  

The goal of the MLCommons Storage working group is to identify the types and performance levels of the storage systems required to support various AI/ML applications. We are creating a new MLPerf benchmark that uses the same terminology and is tailored to the same neural network architecture types as the other MLPerf benchmarks.

This will ultimately help organizations ensure that they acquire the storage capacity and performance they need to build a balanced environment without making the costly mistake of overbuilding or bottlenecking in any single part of the broader system.  

Get the newsletter
From automation advancements to policy announcements, stay ahead of the curve with the bi-weekly AI Business newsletter.