by Ian Firth Speechmatics 19 February 2020
Automatic Speech Recognition (ASR) used to be limited to science fiction. You may recall Star Trek’s Captain Picard interrogating his ship’s computer via voice interface. Thirty years on from The Next Generation, we discover speech recognition technology deeply integrated into our lives both at home and at work.
In fact, the speech recognition technology market is in a huge growth cycle, expected to be valued at a staggering $USD4.1 billion in 2024.
The ASR industry has come a long way since commercially available speech-to-text APIs became available. These systems are no longer relegated to relatively straightforward executions, such as the automatic subtitling of TV or film content, or turning on your lights at home. Complex integrations are deriving insight and analysis of voice data across a range of diverse industry sectors such as the financial services industry, healthcare, retail, and media and entertainment. Vendors can extract meaning from data that was previously unavailable – not only voice triggers, but entire conversations, languages and nuances across multiple speakers.
ASR technology is being used to offer real-time transcription, and extensively in the call center industry. Accurate speech recognition technology provides the call center with analysis of customer conversations to help deliver insight into purchasing habits, and gauge emotional sentiment throughout the customer service journey. It can also be used to prevent mis-selling, and ensure regulatory compliance by capturing and alerting organizations to the presence of potentially sensitive information during customer calls.
On a broader level, the technology enables a full-scale overhaul of customer experience, allowing call center teams, for example, to far more quickly understand what customers are saying and drive precise actions from there. It also aids the hearing-impaired and situationally-disadvantaged.
In the case of ASR, businesses should aspire to create enterprise applications that use voice data in real time, identifying context, punctuation and dialect with hundreds of languages, on-demand, worldwide. This not only creates more accurate and efficient results than human teams, but also maximizes revenue by reducing overhead and providing direct, tangible results to senior and board-level executives.
The analysis of single streams of data (text, video or audio) will soon no longer be enough across the enterprise. In the coming months and years, an increasing number of vertical industries will start harnessing the ‘full signal’ of all three, streamlining processes, empowering workers, and ultimately amplifying the ability for organizations to scale and reallocate resources to more profit-building, strategic pursuits. By harnessing the boundless potential of machine learning, which ‘learns as you learn’, ASR providers can transform the efficiency, profitability of corporate services, optimizing existing workflows and opening the door to new, innovative forms of understanding their customers.
Ian Firth is VP of Products at Speechmatics, a British company which develops automatic speech recognition software based on recurrent neural networks and statistical language modeling.