August 30, 2022
Minds behind WuDao 2.0 develop AI model to run on a single V100 server
The model, dubbed GLM-130B, has 130 billion parameters and supports both English and Chinese. In comparison, GPT-3, OPT-175B and Bloom each have around 175 billion parameters; GPT-3 and Bloom are multilingual but does not support Chinese.
The researchers said that while GPT-3 is the “pioneer” in this field, “it is not available to most people in the world.” The goal of their project was to create a bilingual language model that is “open to everyone in the world – anyone, anywhere can download it” and use it on a single server with the right GPUs, they wrote in a paper.
Notably, the Chinese model was trained using web-crawled data, meaning it could generate potentially harmful or offensive content, similar to Meta’s BlenderBot3.
GLM-130B has been trained on over 400 billion text tokens (200 billion each for Chinese and English).
The model utilizes autoregressive blanking infilling: Essentially, it takes a sentence, mass random continuous spans and predicts them autoregressively. The example below shows the model predicted lyrics from Bob Dylan’s Like a Rolling Stone, with the lyric ‘complete unknown’ masked.
According to the minds behind the model, GLM outperforms GPT-3 in few-short learning when using the Massive Multi-Task Language Understanding (MMLU) benchmark. It also is more open.
Microsoft holds an exclusive license to use GPT-3, with developers required to sign up to access its API. Geographic restrictions apply as certain countries are not supported (a full list can be found here), as well as a language barrier as it only supports English, something the team from Tsinghua sought to change.
As for high accuracy, the Chinese-made model achieved a zero-shot performance accuracy of 80.2% on the zero-shot LAMBADA test for deep learning, while GPT-3, Bloom and OPT could only manage 76.2%, 67.2% and 74.7%, respectively.
It took the team from Tsinghua two months to train the model, during which they also developed a program to perform inference tasks using GLM on only a single server powered by Nvidia V100 GPUs. The research team said they plan to scale the model’s inference ability to run on an RTX-3090 server.
GLM users are prohibited from “knowingly generating or allowing others to knowingly generate harmful content, including hateful, harassment, violence, adult, political, deception,” the model’s rules read.
Bigger not always better
GLM-130B is the second AI model created by Tsinghua University’s AI team.
Alongside the Beijing Academy of Artificial Intelligence, Tsinghua’s researchers made WuDao 2.0, an AI model with a whopping 1.75 trillion parameters, making it the world’s largest language model.
The mammoth model can reportedly predict the 3D structures of proteins – similar to ESMFold and AlphaFold, among other tasks. However, it’s important to note that the size of a language model often does not correlate to quality – and because WuDao is not a monolithic transformer model, it prevents a meaningful ‘apples-to-apples’ comparison.