OpenAI Launches Whisper AI Transcription Tool

Whisper can accurately translate multilingual audio into English.

Ben Wodecki, Jr. Editor

September 27, 2022

2 Min Read

Whisper can accurately translate multilingual audio into English.

OpenAI, the AI research company behind GPT-3, DALL-E and Copilot, has unveiled its latest project: Whisper, an open-source speech recognition system.

Whisper, which was trained on 680,000 hours of data collected from the web, can transcribe into multiple languages.

According to OpenAI, around one-third of the audio used to build Whisper is non-English, with the system able to accurately translate those languages into English.

OpenAI said it is open sourcing the models and inference code for Whisper to provide “a foundation for building useful applications and for further research on robust speech processing.”

“We hope Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications,” a company blog post reads.

The code is available via GitHub. Whisper is also accessible via HuggingFace, with users able to test the tool’s audio-to-text transcription abilities.

Whisper: How it works

Whisper was built using an end-to-end approach. Essentially, the input audio is split into 30-second portions and the human speech element is split out from any background noise and then passed into an encoder.

A decoder is then trained to predict the corresponding text caption, intermixed with tokens that direct the model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription and speech translation to English.


According to OpenAI, other existing approaches use either smaller, more closely paired audio-text training datasets, or broad but unsupervised audio pretraining.

Unlike those, Whisper was built using a large dataset and was not fine-tuned to any specific one.

In a paper outlining the system, OpenAI’s researchers suggest that Whisper’s zero-shot performance across several datasets made 50% fewer errors than rival models like LibriSpeech.

AI transcription has become an essential enterprise tool. Established tools including, Trint and Airgram can transcribe audio via videoconferencing platforms like Microsoft Teams.

But Whisper could, theoretically, perform similar tasks via web browsers and would not require a subscription, unlike its competitors.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like