by Fiona McEvoy
SAN FRANCISCO – Voice controlled technologies have made steady progress lately. The adaptability and applicability of voice as an interface is beginning to surprise us all. At the Startup Grind Festival, a handful of seasoned “voice entrepreneurs” described to eager newbies how sports fans are already calling on virtual assistants to read out their team’s results, and how we’ll all soon be using conversational AI to select our clothes as part of the regular morning routine. And that’s just in the home. There’s also a lot of chatter about how voice control could take some of the heavy-lifting in the workplace.
The message is clear: voice is here to stay. We’re tired of scrolling, sorting and reviewing. We’re ready for an army of intelligent servants to do our bidding.
So what – if anything – will hold this platform back, as many more of us choose to make words louder than actions?
1) We’re not buying
Last summer it was reported that only 2% of 50m Alexa ownershad used the platform to actually buy something in 2018. That’s despite “voice shopping” revenue projected to hit $40bn (UK & US) by 2022. The 2% figure has been disputed, but if it’s even close to the truth then this is more than a minor problem.
Companies like Amazon and Google don’t want us to graduate from a platform that makes them multiple billions of dollars, to a device we only use to play music and ask about the weather.
Perhaps our online browsing habits will die hard, but if we don’t start using voice to buy goods and services soon then our voice controlled assistants are doomed to remain household accessories, not necessities.
2) We can’t hold a real conversation
And when we get close to human-like interaction, people get freaked out. The truth is, developers aren’t anywhere near ready to launch a voice command system that can hold a convincing conversation. The tech just isn’t there. Though it’s true that speech recognition has become stunningly precise – and almost as accurate as humans – there’s still a large and presently insurmountable void between a system recognizing words and having semantic understanding.
Until machines and their applications can get to grips with human intention, the experience of voice is likely to feel stilted and frustrating. We’ll probably stay (mainly) behind our keyboards until we get reassurance of the same efficiency.
3) We don’t think it’s secure
All tech has security blind spots, but how can you begin to secure activities like banking over a voice controlled application? Vijay Balasubramaiyan, the CEO of Pindrop, a company that has developed the software to do just that, told the Startup Grind Conference that each person’s voice has unique characteristics. In essence, this means that voice could become part of our identifying biometric data (like a fingerprint or iris).
So are we good to go? Well, not quite.
Even Balasubramaiyan admits that voices come under different types of strain – like age or illness – and this is a problem for voice security. Indeed, he offered that President Obama’s voice changed so much over his years in office that by year two it wouldn’t have been sufficient to pass security measures trained on the sound of his voice when he entered the White House in 2008.
Furthermore, there were orbiting warnings about the “deepfake” style challenges of voice controlled tech. With only two minutes of voice recording, our dulcet tones can already be accurately spoofed by motivated hackers…
So do these sticking points undermine the burgeoning enthusiasm for this emerging medium? Not necessarily. The enablement of hands-free instruction should not be hastily disregarded by anyone. Indeed, for many users – not least disabled users – the accelerated development of this type of technology has been profoundly life-changing.
Nevertheless, it is worth makers considering how fluid it will be for users to progress from “tapping to talking” in different domains. If a voice-controlled design cuts out a natural part of the process (as with shopping), is a clunky interaction (as with anything that requires fluent conversation), or makes us more nervous (as with personal admin), there will still be thinking to do…