Meta Unveils AI Models that Translate Over 4,000 Languages

Meta researchers published a series of AI models trained on widely translated religious text, like the Bible.

Ben Wodecki, Jr. Editor

May 23, 2023

1 Min Read
Gerd Altmann/Pixabay

At a Glance

  • Meta has released Massively Multilingual Speech, a series of AI models that can be used to translate various languages.
  • Researchers used religious texts that were widely translated, like the Bible, to achieve the models' results.

Researchers at Meta unveiled a series of AI models that can be used to translate more than 4,000 languages. The Massively Multilingual Speech (MMS) models cover text-to-speech and speech-to-text.

Widely used AI applications, like ChatGPT, mostly can identify around 100 languages. Meta’s text-to-speech MMS can generate speech output from more than 1,100 languages, while its speech-to-text version can identify more than 4,000 spoken languages.

According to Meta, the models could be used for VR and AR applications in a person’s preferred language.

To collect audio data for the thousands of languages, Meta’s researchers used religious texts, like the Bible, that have been translated into many different languages and whose translations have been widely studied for text-based language translation research.

Stay updated. Subscribe to the AI Business newsletter

Meta created a dataset of readings of the New Testament in more than 1,100 languages, which provided on average 32 hours of data per language. To beef up the dataset, the researchers added unlabeled recordings of various other Christian religious readings to achieve the 4,000 number.

“While the content of the audio recordings is religious, our analysis shows that this doesn’t bias the model to produce more religious language,” the researchers said.

Meta has opted to open source both the models and underlying code so researchers can build on its work. However, the Facebook parent said it plans to increase the MMS models coverage to support even more languages and apply it to the challenge of handling dialects, which is often difficult for existing speech technology.

MMS is not the first set of models to cover a large number of languages. Rival Google has developed the Universal Speech Model, which can support around 300 languages.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like