Hugging Face Offers 'Training Cluster as a Service'

Businesses can access Nvidia-powered GPU clusters for training domain-specific large language models. There's also a cost calculator.

Ben Wodecki, Jr. Editor

September 6, 2023

2 Min Read

Credit: Hugging Face

At a Glance

Hugging Face, capitalizing on growing interest in custom language models, offers a cloud service to train them at scale.

Hugging Face has unveiled a cloud service that lets developers access large compute clusters for training large language models.

The open-source repository's 'Training Cluster as a Service' gives users access to Hugging Face’s own GPUs – of which the company has thousands, including Nvidia H100s and A100s.

Developers can use the service to train text or multimodal models ranging from seven billion parameters up to 70 billion parameters. Users can input their own dataset or work with Hugging Face to build one.

The service has a training cost calculator as well. For example, if you were to train the 7B version of Meta’s Llama 2 on 301 billion tokens using 200 Nvidia A100 GPUs, it would cost an estimated $57,221 to run for six days.

To access the Hugging Face Training Cluster-as-a-Service, developers will need to join a waitlist.

Hugging Face said the new service is secure – as it does not store training data and users get access to the whole training output, logs and checkpoints.

Hugging Face has plenty of experience training large language models – having been part of the BigScience team that built BLOOM and has gone on to release a host of its own models, including StarCoder (with the help of Salesforce) and HuggingChat.

Stay updated. Subscribe to the AI Business newsletter.

Hugging Face's new Training Cluster-as-a-Service comes at a time when hardware for AI training is becoming increasingly scarce. Demand has skyrocketed amid the generative AI wave. Governments are trying to purchase chips to create national training centers. New AI startups like Inflection are using their funding to buy GPUs. And venture capital firms are even using connections to snap up AI chips to offer to their portfolio companies.

Julien Chaumond, Hugging Face’s CTO and co-founder, said on X (Twitter) that the new service gives companies access to scarce hardware. "Access to a large compute cluster is key for large-scale model training, but historically it's been hard to secure access to large numbers of hardware accelerators, even with a hefty budget."

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Hugging Face Offers 'Training Cluster as a Service'

At a Glance

Stay updated. Subscribe to the AI Business newsletter.

About the Author(s)

Latest News

Trending articles