This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.
Meta AI releases vast language model to expand access to innovation
by Ben Wodecki
Its OPT-175B model has 175 billion parameters, on par with GPT-3.
Natural language processing (NLP) models enable machines to read and interpret language. Large models, those with more than 100 billion parameters, supercharge this ability – able to create poetry and even write code.
Meta’s AI researchers have unveiled OPT-175B: an NLP model with 175 billion parameters.
Meta is releasing the model under a noncommercial license available only to academic researchers, organizations affiliated with governments, civil society and academia, as well as industry research laboratories. Those looking to use OPT-175B must fill out a request form.
Its decision to release under such a license was to "maintain integrity and prevent misuse,” a company blog post reads.
OPT-175B was trained on publicly available datasets. Meta AI said it is also released a logbook that documents the model’s training process.
The logbook details how much compute was used to train the model and the human overhead required when underlying infrastructure or the training process itself becomes unstable at scale.
The codebase used to train and deploy the model using only 16 Nvidia V100 GPUs is also being shared, with Meta suggesting this would increase the accessibility of the models specifically for research purposes.
Further unveiled was a suite of smaller-scale baseline models, trained on the same datasets, so researchers can study the effect of scale.
The parameter count for these smaller-scale models includes 125 million, 350 million, 1.3 billion, 2.7 billion, 6.7 billion, 13 billion and 30 billion. A 66 billion is set to be released soon.
The OPT release comes shortly after Meta announced it was conducting long-term research on how the brain processes language as it seeks to develop better AI models for understanding spoken and written words.
RSC: A big AI model requires a big AI supercomputer
Meta’s decision to release OPT comes as it’s attempting to build the world’s fastest AI supercomputer.
The AI Research SuperCluster (RSC), which will have 16,000 Nvidia A100 GPUs, is being used to train large computer vision and NLP models.
This is the first time the company, formerly known as Facebook, has had a supercomputer capable of training ML models on real-world data sourced from the company’s social media platforms.
Its previous AI supercomputer launched in 2017, which compared with RSC, is a proverbial dinosaur. Meta claims the new supercomputer already delivers three times more performance in large-scale NLP workflows, using less than half of its final hardware footprint.
However, RSC is midway through being built and won’t be completed until later this year.
Size comparison: On par with GPT-3
With 175 billion parameters, OPT-175B is one big model. Comparatively, Open AI’s GPT-3, one of the most famous language models in the world, is on par with OPT in terms of parameters.
Despite being level with GPT-3, Meta’s AI team claims its model was trained using only 1/7th the carbon footprint as that of Open AI’s model.
“This was achieved by combining Meta’s open source Fully Sharded Data Parallel (FSDP) API and Nvidia’s tensor parallel abstraction within Megatron-LM. We achieved ~147 TFLOP/s/GPU utilization on Nvidia’s 80 GB A100 GPUs, roughly 17% higher than published by Nvidia researchers on similar hardware.”
“By sharing these baselines along with the codebase to train a 175 billion model efficiently, we have an opportunity to reduce our collective environmental footprint while also allowing new results and progress in the field to be measurable in a consistent manner,” Meta said.
Despite its efficiency argument, OPT is dwarfed by other models, including the 204 billion model HyperClova from Naver, DeepMind’s Gopher, which has 280 billion parameters and MT-NLP, or Megatron, from Microsoft and Nvidia, which boasts 530 billion.
The world’s largest language model belongs to WuDao 2.0, with Chinese researchers claiming it has 1.75 trillion parameters. Google Brain previously developed an AI language model with 1.6 trillion parameters, using what it called Switch Transformers. However, neither of these two were monolithic transformer models, preventing a meaningful ‘apples-to-apples’ comparison.
Last October, Microsoft and Nvidia introduced Megatron-Turing Natural Language Generation (MT-NLG) with 530 billion parameters. And six months earlier, Chinese tech giant Huawei unveiled what it called the world’s largest Chinese NLP model, Pangu NLP, trained with 207 billion parameters.