Revolutionary Small AI for Edge-Based Speech Recognition

Power speech recognition applications on IoT devices with this small new system from Hugging Face

December 8, 2023

2 Min Read

An artistic representation of a speech recognition system.

Distil-small.en could power speech recognition on mobile devices AI Business via DALL-E 3

At a Glance

This new speech recognition system rivals OpenAI’s Whisper at 49% the size.

AI engineers from open source AI platform Hugging Face have created a speech recognition system designed to work in low-memory environments.

Distil-small.en is just 166 million parameters in size but it’s six times faster than OpenAI’s Whisper v2, despite being 49% smaller.

The small system is a distilled version of the Whisper model. It’s designed to be used in deployments where space and processing power are limited.

For example, distil-small.en could be used to power voice controls in IoT devices like smart home controllers or even cars with smart speakers. Given its size, the system could even be integrated into mobile apps for real-time speech recognition, potentially for translation apps or voice-activated assistants.

The Hugging Face team has been working on distilled versions of OpenAI’s Whisper for some time. This latest version uses four decoder layers, compared to the prior two. Sanchit Gandhi, a machine learning research engineer at Hugging Face said on X (Twitter) that the extra decoder layers “help preserve the model's transcription accuracy at very small model sizes.”

In terms of performance, distil-small.en achieves a higher score in lower latency environments compared to the original Whisper and other distilled versions. However, for environments where more memory is present, the Hugging Face team recommends using either distil-medium.en or distil-large-v2 “since they are both faster and achieve better Word Error Rate (WER) results.”

A results table showing the performance abilities of distil-small.en, a new small AI system designed to power speech recognition applications in low memory environments

Credit: Hugging Face

The distilled versions of Whisper made by Hugging Face are currently only available for English speech recognition. The team behind the system said they’re working on applying it to other languages.

Access distil-small.en

Distil-small.en is available via Hugging Face. It’s available under an MIT license – meaning it’s suitable for commercial purposes. Users are, however, required to retain copyright and permission notices in all copies of the software.

Hugging Face showed off the model being used for transcribing both short and long-form audio files.

There’s also inferencing examples on the right-hand side of distil-small.en’s Hugging Face page where you can hear its speech recognition abilities in action.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Revolutionary Small AI for Edge-Based Speech Recognition

At a Glance

Access distil-small.en

About the Author(s)

Latest News

Trending articles