OpenAI launches AI transcription tool Whisper

Whisper can accurately translate multilingual audio into English.

Ben Wodecki

September 27, 2022

2 Min Read

Whisper can accurately translate multilingual audio into English.

OpenAI, the AI research company behind GPT-3, DALL-E and Copilot, has unveiled its latest project: Whisper, an open-source speech recognition system.

Whisper, which was trained on 680,000 hours of data collected from the web, can transcribe into multiple languages.

According to OpenAI, around one-third of the audio used to build Whisper is non-English, with the system able to accurately translate those languages into English.

OpenAI said it is open sourcing the models and inference code for Whisper to provide “a foundation for building useful applications and for further research on robust speech processing.”

“We hope Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications,” a company blog post reads.

The code is available via GitHub. Whisper is also accessible via HuggingFace, with users able to test the tool’s audio-to-text transcription abilities.

Whisper: How it works

Whisper was built using an end-to-end approach. Essentially, the input audio is split into 30-second portions and the human speech element is split out from any background noise and then passed into an encoder.

A decoder is then trained to predict the corresponding text caption, intermixed with tokens that direct the model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription and speech translation to English.

According to OpenAI, other existing approaches use either smaller, more closely paired audio-text training datasets, or broad but unsupervised audio pretraining.

Unlike those, Whisper was built using a large dataset and was not fine-tuned to any specific one.

In a paper outlining the system, OpenAI’s researchers suggest that Whisper’s zero-shot performance across several datasets made 50% fewer errors than rival models like LibriSpeech.

AI transcription has become an essential enterprise tool. Established tools including Otter.ai, Trint and Airgram can transcribe audio via videoconferencing platforms like Microsoft Teams.

But Whisper could, theoretically, perform similar tasks via web browsers and would not require a subscription, unlike its competitors.

About the Authors

Ben Wodecki

Assistant Editor

Get the newsletter
From automation advancements to policy announcements, stay ahead of the curve with the bi-weekly AI Business newsletter.