Hugging Face Offers 'Training Cluster as a Service'
Businesses can access Nvidia-powered GPU clusters for training domain-specific large language models. There's also a cost calculator.
At a Glance
- Hugging Face, capitalizing on growing interest in custom language models, offers a cloud service to train them at scale.
Hugging Face has unveiled a cloud service that lets developers access large compute clusters for training large language models.
The open-source repository's 'Training Cluster as a Service' gives users access to Hugging Face’s own GPUs – of which the company has thousands, including Nvidia H100s and A100s.
Developers can use the service to train text or multimodal models ranging from seven billion parameters up to 70 billion parameters. Users can input their own dataset or work with Hugging Face to build one.
The service has a training cost calculator as well. For example, if you were to train the 7B version of Meta’s Llama 2 on 301 billion tokens using 200 Nvidia A100 GPUs, it would cost an estimated $57,221 to run for six days.
To access the Hugging Face Training Cluster-as-a-Service, developers will need to join a waitlist.
Hugging Face said the new service is secure – as it does not store training data and users get access to the whole training output, logs and checkpoints.
Hugging Face has plenty of experience training large language models – having been part of the BigScience team that built BLOOM and has gone on to release a host of its own models, including StarCoder (with the help of Salesforce) and HuggingChat.
Stay updated. Subscribe to the AI Business newsletter.
Hugging Face's new Training Cluster-as-a-Service comes at a time when hardware for AI training is becoming increasingly scarce. Demand has skyrocketed amid the generative AI wave. Governments are trying to purchase chips to create national training centers. New AI startups like Inflection are using their funding to buy GPUs. And venture capital firms are even using connections to snap up AI chips to offer to their portfolio companies.
Julien Chaumond, Hugging Face’s CTO and co-founder, said on X (Twitter) that the new service gives companies access to scarce hardware. "Access to a large compute cluster is key for large-scale model training, but historically it's been hard to secure access to large numbers of hardware accelerators, even with a hefty budget."
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like