Justice officials from the United Arab Emirates are seeking help from US authorities to investigate fraudsters who cloned voices to nab $35m from an unwitting bank manager.
Last January, a UAE-based bank manager spoke with a company director on the phone – the pair had spoken before.
The director told the teller that his company was going to make an acquisition and needed the bank to authorize $35m in transfers.
A lawyer by the name of Martin Zelner was coordinating the transaction.
The bank manager had received emails from both Zelner and the company official, which detailed where the money was to be transferred.
Unaware of what was going on, the manager approved the funds – and the $35m was in fact transferred to criminals. They had effectively recreated the company official’s voice.
A complex scheme
The scammers used "deep voice technology to simulate the voice of the director," a court document obtained by Forbes reads.
The seven-page doc, filed in the US District Court for the District of Columbia, states that the Dubai Public Prosecution Office requested help from US authorities in tracking down the 'deep voice' fraudsters.
The $35m was transferred to “several” bank accounts in “other countries” in what UAE authorities describe as “a complex scheme involving at least 17 known and unknown defendants.”
Days after the unfortunate call, around $415,000 was sent to two Centennial bank accounts located in the US.
Emirate authorities have asked their US counterparts to provide bank records and information on those accounts.
The more material, the better the voice
Deep voice technology is essentially using machine learning models to create audio ‘deepfakes.’
To recreate a voice, such systems require audio recordings of the speaker. Those files are fed into a model, which learns the voice and then replicates it.
To achieve an accurate representation, the systems require around two to three hours of voice recordings , according to Andrea Hauser, an IT security consultant from scip.
“The more such material is available, the better the audio deepfake will be,” Hauser said.
The AI Business team managed to recreate the voice of Sir Patrick Stewart to appear on a recent podcast episode thanks to the help of Uberduck.ai – a free-to-use synthetic voice tool.
But while we used it for laughs, the technology is being adopted for more nefarious reasons.
The ‘Zelner’ case is the second widely publicized instance of fraudsters having recreated voices for their illicit activities.
Criminals tried to impersonate the CEO of a UK energy company to try and steal $240,000 in 2019, according to the Wall Street Journal.
In that case, the initial payment was made, but the crooks tried calling several more times to get more money – and failed.
The $240,000 was transferred to a Hungarian bank account, which has subsequently moved to Mexico and other locations. No suspects were ever identified.
The emergence of synthetic voice
Besides Uberduck.ai, what other voice synthesis tools are already out there?
TTS from Mozilla and tacotron2 from NVIDIA are two publicly accessible tools – however, both require deep technical understanding.
MARVEL.ai from Veritone is another. This Voice-as-a-Service (VaaS) product was designed to allow brands to create, manage, license, and monetize synthetic speech – including “hyper-realistic” celebrity voices – through AI.
MARVEL.ai customers can request a particular voice model for auto-generated content or attempt to create their own.
Good Mythical Morning, the popular YouTube show, was among Vertione’s first VaaS customers. In September, show hosts Rhett and Link struck a deal to use MARVEL.ai to create voices for their content.