Meta Launches Seamless: AI for Real-Time, Expressive Translation

Meta marks a decade of its FAIR research team with new AI translation models and a dataset for wearables

Ben Wodecki, Jr. Editor

December 6, 2023

3 Min Read
Abstract representation of AI audio translation. Meta showcased Seamless, AI models for powering language translation that retains vocal nuances
The suite of models could be used to improve talks between diplomats or help tourists understand local lingo.AI Business via Runway

At a Glance

  • Meta's new Seamless AI offers real-time, expressive language translation.
  • The tech giant also unveils a dataset to power AI systems in wearables.

Facebook parent Meta celebrated 10 years of its FAIR AI lab by unveiling Seamless, a family of language models for translating in real-time.

The family of models are built atop SeamlessM4T v2, the latest version of the foundational model released last August. They’re designed to power cross-lingual communication in real time while retaining expressive elements of speech, such as tone, pauses and emphasis.

The new suite incorporates measures to mitigate toxicity and bias as well as audio watermarking to prevent misuse. A similar watermarking measure was introduced in Meta’s recent Audiobox AI system.

Alongside SeamlessM4T v2, the family of audio translation models contains:

  • SeamlessExpressive - A model for preserving expression in speech-to-speech translation. It’s supposed to preserve a speaker’s emotion and style while addressing issues with AI translations around speech rates and pauses for rhythm.

  • SeamlessStreaming – A model that generates responses from someone who speaks a different language, creating the translation while the speaker is still talking with just two seconds of latency.

Meta said the models could improve current translation methods that are too slow for effective communication.

“Tone of voice, pauses and emphasis carry important signals that help us communicate emotions and intent. Moreover, human speech and translation are sensitive to nuances such as turn-taking and timing controls.

Related:Create Unique Sounds with Meta's New Audiobox AI

“Picture, for example, how human interpreters work: they find just the right balance between low-latency and accurate translations. Waiting too long stifles the flow of communication while going too fast compromises the overall quality of a translation.”

The suite of models could be used to improve communications between global leaders or help tourists understand local lingo.

Access Seamless AI model

Meta is open-sourcing the four Seamless models (Seamless, SeamlessM4T v2, SeamlessExpressive and SeamlessStreaming) in the hopes that researchers will build on their work. The Seamless suite can be accessed via GitHub.

Seamless cannot be used for commercial purposes as it’s available under a CC BY-NC 4.0 license which does not permit the user to deploy Seamless for commercial use. It’s also covered by an MIT license which states that Meta won’t be liable for any claim arising from the use of the software.

You can try out the SeamlessExpressive model for yourself. Using the SeamlessExpressive demo, you can generate translations into one of four languages – English, Spanish, German or French – while keeping expressive speech elements like speed and volume. You will, however, have to agree to terms before using the demo. The demo is not to be used to generate commercial content.

Related:Meta, IBM Lead New AI Alliance to Support Open Innovation

Meta announced it’s also releasing the related metadata, data and data alignment tools for Seamless to assist researchers.

Ego-Exo4D dataset

Also announced to mark a decade of FAIR was Ego-Exo4D, a new suite supporting multimodal vision-focused models.

Meta published a dataset and benchmark, which users can test the video learning capabilities of AI systems. Ego-Exo4D is designed to teach and evaluate AI systems’ ability to perceive human activities from a first-person perspective to simulate the views from wearable cameras.

It took some two years to develop and needed the support of 15 university partners, but Ego-Exo4D provides scenarios of activities like playing sports or washing dishes. And it’s not solely video footage – the multimodal suite contains audio channels, inertial measurement units and other sensor-based information.

Meta suggests it could be used to help future augmented reality (AR) systems with use cases like powering virtual AI coaches in smart glasses.

Alongside the first-person view, the dataset also includes multiple exocentric” views from cameras surrounding the participant. Meta’s researchers believe that by supplying multiple views of a skill, it could “learn about the subtle aspects of skilled human activities.”

Related:Meta Galactica Author Breaks Silence on Model's Turbulent Launch

AI models could learn about the subtle aspects of skilled human activities. To our knowledge, there is no prior video resource with such extensive and high-quality multimodal data.

Ego-Exo4D will be available for download before the end of December 2023. Meta is planning to host a public benchmark challenge for Ego-Exo4D in 2024.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like