Google Discovers Novel Method to Improve Speech Processing

AudioPaLM combines its LLM with an audio generation model to outperform OpenAI’s Whisper Large-v2

June 27, 2023

2 Min Read

Getty Images

At a Glance

Google has unveiled AudioPaLM: A combination of its PaLM-2 LLM with its AudioLM audio generation model.
AudioPaLM can generate text and speech for speech recognition and speech-to-speech translation.

AI researchers from Google have discovered that adding a large language model to an audio generation model improves tasks such as speech recognition and translation.

They developed AudioPaLM, which is a combination of AudioLM, an audio generation model, and PaLM-2, Google’s flagship large language model. It is designed to leverage larger quantities of text training data to assist with speech tasks.

Google’s researchers contend that adding a text-only large language model to an audio-generative system improves speech processing and outperforms existing systems for speech translation tasks.

A paper outlining the model shows AudioPaLM outclassing audio generation models such as Whisper Large-v2 from OpenAI, mSLAM-CTC 2B and Google’s own USM-M when using the CoVoST 2 corpus for the BLEU test.

AudioPaLM can also be fine-tuned to consume and produce tokenized audio on a mixture of speech-to-text tasks. The model can also perform zero-shot speech-to-text translation for languages not seen in its training simply based on a short spoken prompt.

Google has opted not to release the code for the model, instead publishing a series of examples to GitHub.

Stay updated. Subscribe to the AI Business newsletter

Researchers from rival Meta opted for a similar release method for its recently released multimodal audio model, Voicebox, for fear it could be used for malicious purposes. Google's research did not say why it elected not to publish the code, however.

Alongside AudioPaLM, Google has applied PaLM to various other fields to achieve sector-specific results, including Sec-PaLM, which can detect malicious scripts for cybersecurity experts and Med-PaLM-2, which can be used to help determine medical issues with images, like X-rays.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Google Discovers Novel Method to Improve Speech Processing

At a Glance

Stay updated. Subscribe to the AI Business newsletter

About the Author(s)

Latest News

Trending articles