Microsoft’s recently proposed $19.7 billion acquisition of language specialist Nuance shows that the future of speech recognition is bright.
However, as this technology is integrated into an ever-growing list of products and services, access issues will undoubtedly arise.
Language recognition models can discriminate against users with atypical speech.
The same tech can be adapted to help those with speech, hearing, or visual impairments, as well as the elderly.
Google’s Project Euphonia, for example, is using AI to improve how devices understand diverse speech patterns. The search engine giant is also training its Google Assistant to recognize users with speech impairments.
Around the same time it launched Project Euphoria, Google unveiled Live Relay, an app that uses on-device speech recognition and text-to-speech conversion to assist those who are unable to speak or hear, so they can participate in phone calls.
Another company working in this direction is Voiceitt, a Tel Aviv-based startup which enables individuals with atypical speech to communicate and be understood.
Voiceitt has developed an app that can interpret a user's input and transform it into normalized speech, to be output as audio or text. In late 2020, the startup secured a major customer in Amazon with Alexa integration, allowing users with speech impairments to use the smart speakers. Amazon said at the time that the move would “open new doors for mobility and independence.”
AI Business caught up with Voiceitt founder and CEO Danny Weissberg to talk about the value of speech technology and its exciting, emerging applications.
Weissberg said that individuals with non-standard speech – something that can be caused by stroke, cerebral palsy, amyotrophic lateral sclerosis (ALS), or Down syndrome – cannot communicate effectively with standard speech recognition technologies found in smartphones or smart home devices.
He suggested that those devices “were generally not created to support non-standard speech.”
To combat the issue, the team at Voiceitt has developed an AI-powered speech recognition app that can learn the unique patterns of an individual's way of speaking, independent of the language.
“Voiceitt’s AI component enables it to continuously improve its ASR [Automatic Speech Recognition] capabilities by learning to identify speech patterns from specific voice samples around the world and analyzing individual users' speech patterns including utterances, cadence, breathing pauses, and more.
“The more people use the app, the larger the corpus of voice samples, and, subsequently, the smarter and more effective the AI technology,” Weissberg said.
In practice, the app allows users to communicate with loved ones and caretakers. Weissberg said it can also enable users to talk to devices like Alexa to independently perform daily tasks such as turning on the lights, changing the TV channel, or playing music — “offering many a degree of independence they never had before.”
Support from the top
Weissberg explained that Voiceitt has benefited greatly from the support of large corporations, including Microsoft's M12 Venture Capital Fund and the Alexa fund.
He suggested that the voice tech revolution has accelerated over the past few years, bringing life-changing improvements to many individuals.
“For instance, one of our users is a young woman with a speech and motor disability living in a care facility. Every night she would sleep with the light on in her room because she felt uncomfortable ‘bothering’ a nurse to perform such a simple task. With Voiceitt, she is now able to tell Alexa to turn off the light all on her own, without the nurse’s assistance.”
Weissberg noted that for those who struggle, or are unable to talk, having to be reliant on others can impact their sense of self and independence – something that could be solved by using virtual assistants.
Weissberg revealed that his team is developing a software development kit (SDK) to make it easier to integrate its models into devices and services.
“For example, the healthcare system is beginning to move towards voice-based AI systems – as indicated by Microsoft’s recent acquisition of Nuance – and there is significant potential for Voiceitt in that field,” he said.
“In the restaurant industry, there has been a trend towards adapting voice AI technologies to drive-throughs, mobile ordering apps, and other devices."
He added that Voiceitt is already talking with several educational institutions about potentially leveraging its technology. Other sectors that could benefit from Voiceitt include online retail, travel, and transportation.
Microsoft may be spending an incredible amount of money on Nuance, but you can’t put a price tag on giving someone their independence, or a voice.