First open source, large multilingual model not owned by Big Tech
Large language models are being used in AI to do such things as generate content or predict the next word in a sentence. But access was limited and the resources needed mainly resided in Big Tech companies.
A year ago, a consortium called BigScience comprising more than 1,000 researchers from over 70 countries and at least 250 institutions began developing an open source large language model in multiple languages.
The result is BLOOM, or BigScience Large Open-science Open-access Multilingual Language Model, a 176 billion-parameter, multilingual AI model that is open source and general purpose. It beats the groundbreaking language model, GPT-3, by a billion parameters.
BLOOM can generate text in 46 natural languages and 13 programming languages. For the likes of French, Arabic and Spanish, among others, this is the first time they have been represented in a language model with over 100 billion parameters.
The model can be accessed and used on a local machine or in the cloud. And if researchers do not have access to large servers to train their models, BigScience lead Hugging Face, an AI startup, is working on an inference API for large-scale use without dedicated hardware or engineering. An early version of the API is available now for lower-scale testing.
Teams from Nvidia’s Megatron, Microsoft’s DeepSpeed and the French National Research Agency come together to build BLOOM. French research agencies CNRS and GENCI provided the minds behind the model with a compute grant of $3 million to train the model using the Jean Zay supercomputer located in Paris.
BLOOM can be downloaded by researchers under a Responsible AI License. This license, created by BigScience, imposes no restrictions on reuse, distribution or commercialization so long as users of the model commit to not applying it to use-cases that have been restricted.
Use case restrictions for using the model include generating false information to harm others, impersonation, automating decision-making that harms an individual’s legal rights and discriminating against legally protected characteristics.
Not controlled by Big Tech
CambrianAI analyst Alberto Romero described BLOOM in a blog as “the most important AI Model of the decade.”
That’s because BLOOM is open source while other models such as GPT-3, PaLM, Gopher, Chinchilla and Gato all “stem from the immense resources of private tech companies … (and their research labs) exert absolute control over them,” he wrote. BLOOM “will break the stranglehold big tech has on the research and development of large language models.”
While some big tech companies recently open-sourced some of their large transformer-based models, Romero believes these do not represent their best work. "Earning money is their main goal, so sharing their state-of-the-art research isn’t on the table."
“BigScience and BLOOM are, without a doubt, the most notable attempt at bringing down all the barriers that big tech has erected — willingly or unwillingly — throughout the last decade in the AI field,” he said. “And the most sincere and honest undertaking to building AI (large language models in particular) that benefits everyone.”
BLOOM appears to be just the beginning. Upon the announcement, the BigScience project revealed big plans for the model.
Those plans include adding more languages, compressing the model into a more usable version while retaining the same level of performance and using it as a starting point for more complex architectures.
“BLOOM is the seed of a living family of models that we intend to grow, not just a one-and-done model, and we’re ready to support community efforts to expand it,” according to the consortium.
How does it stack up?
With 176 billion parameters, BLOOM is a big model. It is bigger than Open AI’s GPT-3 and Meta’s OPT-175B by one billion parameters. And Gato was DeepMind’s study focused on three Gato model sizes in parameters of 79 million, 364 million and 1.18 billion.
The smallest model was found to perform the worst, with the results suggesting that greater capacity allows for the model to use representations learned from the diverse training data at test time.
However, BLOOM is bigger than DeepMind’s recently released Gato, but is dwarfed by another DeepMind language model, Gopher, as well as Naver’s HyperClova and MT-NLP, and Megatron, from Microsoft and Nvidia.
But despite outsizing the likes of GPT-3, BLOOM still lags behind the world’s largest language model, WuDao 2.0, which reportedly has 1.75 trillion parameters. And Google Brain’s Switch Transformers has 1.6 trillion parameters – although both are not monolithic transformer models, preventing a meaningful 'apples-to-apples' comparison.