AI in the data center: FPGAs for workload acceleration

Myrtle.ai is improving AI processing using clever software and FPGA-based accelerator cards

Guy Matthews

October 26, 2021

4 Min Read

Cambridge has good reason to be regarded as the AI capital of the UK.

It seems to produce more world-beating AI businesses than any other region of the country and has a reputation in the field that resonates around the world.

A perfect home then for Myrtle.ai, a specialist in deep learning acceleration solutions.

Myrtle has assembled a team of experts in the development of low-power inference circuits and software and caters to a customer base that includes Intel and Jaguar Land Rover.

To establish what makes the company tick, and where deep learning is headed, we interviewed Peter Baldwin, CEO of Myrtle.ai.

AIB: What are the key challenges faced by companies trying to run ML at scale?

PB: Typically, they’re trying to get more out of their existing infrastructure.

They want to get the best ML model to run within tight latency bounds and at good performance.

That requires full stack optimization. But very often there’s a mismatch of what current data center infrastructure can provide and the needs of ML inference workloads.

Existing infrastructure can be very inefficient when used to deploy algorithms that were not conceived of when the hardware was designed.

This makes it hard for companies to increase their capability quickly in response to a rapidly evolving market need.

Without a change in strategy, they’re looking at a major investment in new servers, possibly whole new buildings, and new energy supplies.

The resulting costs and delays can be daunting.

These challenges are compounded when the applications are real-time and require inference latencies of only a few milliseconds, with processing efficiency falling significantly at lower batch sizes.

AIB: How would you describe Myrtle.ai’s mission?

PB: In one word, efficiency.

By adding our products to existing server infrastructure, companies can significantly increase the latency bounded throughput of their existing data center infrastructure and so rapidly scale capability at much lower cost.

We’ve demonstrated Capex savings of 50 percent and energy savings of nearly 80 percent in some real-world hyperscale AI and ML applications.

For example, our SEAL Accelerator Module for recommendation models.

These are one of the most common data center workloads. They’re used to rank searches, adverts, feeds and personalization in so many different types of recommendation systems.

The more ranking you can do in a tight latency window, the more relevant the result, and that leads to increased revenue.

Unfortunately, available computer resources in a typical infrastructure are heavily under-utilized when processing recommendation models, due to memory bandwidth limitations.

SEAL removes these bandwidth limitations, by optimizing the sparse features of the models.

This enables compute resources to be fully utilized, and system performance to run at maximum efficiency.

Another product, our MAU Accelerator, follows a similar ethos to enable efficiency gains through the exploitation of sparsity.

It exploits characteristics of RNNs (recurrent neural networks) and other DNNs (deep neural networks) which exhibit high degrees of sparsity, increasing latency-bounded throughput by as much as 165x.

Typical applications for this product include speech transcription, speech synthesis, NLP, time series analysis and fraud detection.

AIB: What does the future of data center infrastructure look like?

PB: Just looking back over the last few years highlights how fast this industry is changing.

We’ve seen a move from training to inference, which has changed the demands placed on data centers.

We’ve seen a massive increase in the use of ML in data centers, which has demanded efficiency improvements. And we’ve seen a rapid evolution in ML models: they’re deeper and more diverse.

Each new model challenges the hardware infrastructure in different ways.

Flexible, heterogeneous compute platforms are required to respond to these changing demands and that’s our business.

Capitalizing on the most efficient use of existing and emerging silicon, exploiting model compression techniques and tracking the latest algorithmic innovations, it will certainly be an interesting future.

Who is Myrtle.ai?

Cambridge-based Myrtle.ai is focused on realizing deep learning networks as efficient silicon designs based on FPGAs, which execute at low latency and low power.

The company accelerates performance critical workloads that are currently being deployed at scale in global data centers.

It is also involved in a major collaboration to address the safety and verification challenges of using sophisticated AI in road vehicles.

To find out more about how adaptive computing platforms are being used to power the AI revolution, download our eBook: ‘AI in the data center: Harnessing the power of FPGAs’

About the Authors

Guy Matthews

Freelance Contributor

Get the newsletter
From automation advancements to policy announcements, stay ahead of the curve with the bi-weekly AI Business newsletter.