Google Discovers Novel Method to Improve Speech Processing
AudioPaLM combines its LLM with an audio generation model to outperform OpenAI’s Whisper Large-v2
At a Glance
- Google has unveiled AudioPaLM: A combination of its PaLM-2 LLM with its AudioLM audio generation model.
- AudioPaLM can generate text and speech for speech recognition and speech-to-speech translation.
AI researchers from Google have discovered that adding a large language model to an audio generation model improves tasks such as speech recognition and translation.
They developed AudioPaLM, which is a combination of AudioLM, an audio generation model, and PaLM-2, Google’s flagship large language model. It is designed to leverage larger quantities of text training data to assist with speech tasks.
Google’s researchers contend that adding a text-only large language model to an audio-generative system improves speech processing and outperforms existing systems for speech translation tasks.
A paper outlining the model shows AudioPaLM outclassing audio generation models such as Whisper Large-v2 from OpenAI, mSLAM-CTC 2B and Google’s own USM-M when using the CoVoST 2 corpus for the BLEU test.
AudioPaLM can also be fine-tuned to consume and produce tokenized audio on a mixture of speech-to-text tasks. The model can also perform zero-shot speech-to-text translation for languages not seen in its training simply based on a short spoken prompt.
Google has opted not to release the code for the model, instead publishing a series of examples to GitHub.
Stay updated. Subscribe to the AI Business newsletter
Researchers from rival Meta opted for a similar release method for its recently released multimodal audio model, Voicebox, for fear it could be used for malicious purposes. Google's research did not say why it elected not to publish the code, however.
Alongside AudioPaLM, Google has applied PaLM to various other fields to achieve sector-specific results, including Sec-PaLM, which can detect malicious scripts for cybersecurity experts and Med-PaLM-2, which can be used to help determine medical issues with images, like X-rays.
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like