The chip wars are well underway
Amazon Web Services has debuted a custom chip created specifically for machine learning model training, called Trainium.
Launching next year, the hardware will be joined in the cloud by another AI-focused chip, Intel’s Habana Gaudi processor.
Machine learning as far as the eye can see
Trainium complements Amazon’s existing Inferentia chip, which handles the comparatively less computationally-intensive inference workloads required to run machine learning models.
The new chip focuses on the more difficult task of training the model, something that has been typically handled by GPUs. Over the last few years, a number of new processors have popped up to attempt and dethrone the GPU as a tool for machine learning, notably Google’s tensor processing units – but they are only available through its cloud service.
Amazon claimed that Trainium will offer the most teraflops of any machine learning instance in the cloud at the lowest cost, but did not provide any benchmarks.
By the time the chip launches, some time in the second half of 2021 as EC2 instances, Nvidia and AMD will likely have updated their GPU lineup, while Google may have new TPUs out. It is not clear if Trainium will be able to maintain its claim.
Then there’s Habana. Intel acquired the company in late 2019 for $2bn, and immediately killed off the Nervana AI chips it had previously developed.
Intel claims that Gaudi accelerators deliver up to 40 percent better price-performance than current GPU-based EC2 instances for machine learning workloads, but again specific benchmarks have not been published.
Slated for the first half of 2021, EC2 instances will feature up to eight Gaudi accelerators per server. An 8-card EC2 instance can process about 12,000 images-per-second while training in the ResNet-50 model on TensorFlow, Intel claims.
“We are proud that AWS has chosen Habana Gaudi processors for its forthcoming EC2 training instances,” said David Dahan, chief executive officer at Habana.
“The Habana team looks forward to our continued collaboration with AWS to deliver on a roadmap that will provide customers with continuity and advances over time.”
Neither AWS nor Intel have commented on which of their processors has a better price-performance ratio.