Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!
August 30, 2022
Minds behind WuDao 2.0 develop AI model to run on a single V100 server
The model, dubbed GLM-130B, has 130 billion parameters and supports both English and Chinese. In comparison, GPT-3, OPT-175B and Bloom each have around 175 billion parameters; GPT-3 and Bloom are multilingual but does not support Chinese.
The researchers said that while GPT-3 is the “pioneer” in this field, “it is not available to most people in the world.” The goal of their project was to create a bilingual language model that is “open to everyone in the world – anyone, anywhere can download it” and use it on a single server with the right GPUs, they wrote in a paper.
Notably, the Chinese model was trained using web-crawled data, meaning it could generate potentially harmful or offensive content, similar to Meta’s BlenderBot3.
GLM-130B has been trained on over 400 billion text tokens (200 billion each for Chinese and English).
The model utilizes autoregressive blanking infilling: Essentially, it takes a sentence, mass random continuous spans and predicts them autoregressively. The example below shows the model predicted lyrics from Bob Dylan’s Like a Rolling Stone, with the lyric ‘complete unknown’ masked.
According to the minds behind the model, GLM outperforms GPT-3 in few-short learning when using the Massive Multi-Task Language Understanding (MMLU) benchmark. It also is more open.
Microsoft holds an exclusive license to use GPT-3, with developers required to sign up to access its API. Geographic restrictions apply as certain countries are not supported (a full list can be found here), as well as a language barrier as it only supports English, something the team from Tsinghua sought to change.
As for high accuracy, the Chinese-made model achieved a zero-shot performance accuracy of 80.2% on the zero-shot LAMBADA test for deep learning, while GPT-3, Bloom and OPT could only manage 76.2%, 67.2% and 74.7%, respectively.
It took the team from Tsinghua two months to train the model, during which they also developed a program to perform inference tasks using GLM on only a single server powered by Nvidia V100 GPUs. The research team said they plan to scale the model’s inference ability to run on an RTX-3090 server.
GLM users are prohibited from “knowingly generating or allowing others to knowingly generate harmful content, including hateful, harassment, violence, adult, political, deception,” the model’s rules read.
Bigger not always better
GLM-130B is the second AI model created by Tsinghua University’s AI team.
Alongside the Beijing Academy of Artificial Intelligence, Tsinghua’s researchers made WuDao 2.0, an AI model with a whopping 1.75 trillion parameters, making it the world’s largest language model.
The mammoth model can reportedly predict the 3D structures of proteins – similar to ESMFold and AlphaFold, among other tasks. However, it’s important to note that the size of a language model often does not correlate to quality – and because WuDao is not a monolithic transformer model, it prevents a meaningful ‘apples-to-apples’ comparison.
Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.
You May Also Like
Generative AI Journeys with CDW UK's Chief TechnologistFeb 28, 2024
Qantm AI CEO on AI Strategy, Governance and Avoiding PitfallsFeb 14, 2024
Deloitte AI Institute Head: 5 Steps to Prepare Enterprises for an AI FutureJan 31, 2024
Athenahealth's Data Science Architect on Benefits of AI in Health CareJan 19, 2024