Sponsored by Google Cloud
Choosing Your First Generative AI Use Cases
To get started with generative AI, first focus on areas that can improve human experiences with information.
Power speech recognition applications on IoT devices with this small new system from Hugging Face
AI engineers from open source AI platform Hugging Face have created a speech recognition system designed to work in low-memory environments.
Distil-small.en is just 166 million parameters in size but it’s six times faster than OpenAI’s Whisper v2, despite being 49% smaller.
The small system is a distilled version of the Whisper model. It’s designed to be used in deployments where space and processing power are limited.
For example, distil-small.en could be used to power voice controls in IoT devices like smart home controllers or even cars with smart speakers. Given its size, the system could even be integrated into mobile apps for real-time speech recognition, potentially for translation apps or voice-activated assistants.
The Hugging Face team has been working on distilled versions of OpenAI’s Whisper for some time. This latest version uses four decoder layers, compared to the prior two. Sanchit Gandhi, a machine learning research engineer at Hugging Face said on X (Twitter) that the extra decoder layers “help preserve the model's transcription accuracy at very small model sizes.”
In terms of performance, distil-small.en achieves a higher score in lower latency environments compared to the original Whisper and other distilled versions. However, for environments where more memory is present, the Hugging Face team recommends using either distil-medium.en or distil-large-v2 “since they are both faster and achieve better Word Error Rate (WER) results.”
Credit: Hugging Face
The distilled versions of Whisper made by Hugging Face are currently only available for English speech recognition. The team behind the system said they’re working on applying it to other languages.
Distil-small.en is available via Hugging Face. It’s available under an MIT license – meaning it’s suitable for commercial purposes. Users are, however, required to retain copyright and permission notices in all copies of the software.
Hugging Face showed off the model being used for transcribing both short and long-form audio files.
There’s also inferencing examples on the right-hand side of distil-small.en’s Hugging Face page where you can hear its speech recognition abilities in action.
Read more about:
ChatGPT / Generative AIYou May Also Like