Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!
December 8, 2023
AI engineers from open source AI platform Hugging Face have created a speech recognition system designed to work in low-memory environments.
Distil-small.en is just 166 million parameters in size but it’s six times faster than OpenAI’s Whisper v2, despite being 49% smaller.
The small system is a distilled version of the Whisper model. It’s designed to be used in deployments where space and processing power are limited.
For example, distil-small.en could be used to power voice controls in IoT devices like smart home controllers or even cars with smart speakers. Given its size, the system could even be integrated into mobile apps for real-time speech recognition, potentially for translation apps or voice-activated assistants.
The Hugging Face team has been working on distilled versions of OpenAI’s Whisper for some time. This latest version uses four decoder layers, compared to the prior two. Sanchit Gandhi, a machine learning research engineer at Hugging Face said on X (Twitter) that the extra decoder layers “help preserve the model's transcription accuracy at very small model sizes.”
In terms of performance, distil-small.en achieves a higher score in lower latency environments compared to the original Whisper and other distilled versions. However, for environments where more memory is present, the Hugging Face team recommends using either distil-medium.en or distil-large-v2 “since they are both faster and achieve better Word Error Rate (WER) results.”
Credit: Hugging Face
The distilled versions of Whisper made by Hugging Face are currently only available for English speech recognition. The team behind the system said they’re working on applying it to other languages.
Distil-small.en is available via Hugging Face. It’s available under an MIT license – meaning it’s suitable for commercial purposes. Users are, however, required to retain copyright and permission notices in all copies of the software.
Hugging Face showed off the model being used for transcribing both short and long-form audio files.
There’s also inferencing examples on the right-hand side of distil-small.en’s Hugging Face page where you can hear its speech recognition abilities in action.
Read more about:ChatGPT / Generative AI
Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.
You May Also Like