Trained on 1.1m hours of unlabeled audio data
Cambridge-based speech recognition firm Speechmatics has launched the Autonomous Speech Recognition engine.
The platform can detect voices regardless of accent and dialect – with the firm claiming it outperformed similar models from the likes of AWS, Google, and Apple.
“Our focus in tackling AI bias has led to this monumental leap forward in the speech recognition industry and the ripple effect will lead to changes in a multitude of different scenarios,” Katy Wigdahl, CEO of Speechmatics, said.
“Think of the incorrect captions we see on social media, court hearings where words are mistranscribed and eLearning platforms that have struggled with children's voices throughout the pandemic. Errors people have had to accept until now can have a tangible impact on their daily lives."
Speechmatics was founded in 2006 by Dr. Tony Robinson – a pioneer in applying recurrent neural networks to speech recognition. The company launched its cloud-based speech recognition services in 2012.
It raised a total of $8.2m in funding across two rounds, most recently bringing in £6.4m ($8.8m) in a Series A in late 2019. AlbionVC and IQ Capital led that round.
Now, Speechmatics launched a new speech recognition engine that promises improved accuracy.
The startup said that when using datasets from Stanford’s ‘Racial Disparities in Speech Recognition’ study, its software bested other systems for African American voices, with an accuracy score of 82.8 percent compared to Google (68.7 percent) and Amazon (68.6 percent).
Speechmatics said its software also outperformed competitors on children’s voices – recording 91.8 percent accuracy compared to Google (83.4 percent) and Deepgram (82.3 percent).
Such accuracy equates to a 45 percent reduction in speech recognition errors – the equivalent of three words in an average sentence, the company said.
“It's critical to study and improve fairness in speech-to-text systems given the potential for disparate harm to individuals through downstream sectors ranging from healthcare to criminal justice,” said Allison Zhu Koenecke, lead author of the Stanford study on speech recognition.
Speechmatics’ technology is trained on unlabeled data direct from the Internet, such as social media content and podcasts. Using self-supervised learning, its software is now trained on 1.1m hours of audio.
“This delivers a far more comprehensive representation of all voices and dramatically reduces AI bias and errors in speech recognition,” the company’s launch statement reads.