Amir Konigsberg, founding director of Hour One, outlines his team’s work to bring animated characters to life using AI
It was never obvious that 2017’s The Boss Baby would become a massive commercial hit.
The animated flick saw Alec Baldwin voice a secret agent – who is also a baby – tasked with winning an ongoing war between babies and puppies for the world’s share of love.
A truly bizarre premise, and lukewarm reviews – just 53 percent on Rotten Tomatoes – and yet, it cannot be overstated just how much kids loved this film.
So much so that a television series was greenlit almost immediately, hitting Netflix the following year.
And now, it’s produced a sequel. The Boss Baby: Family Business hit cinemas this October, but so did something else: an AI-powered version of the titular animated character able to provide personalized video messages on Cameo.
For those not in the know, Cameo is a relatively new online service that allows users to purchase video messages or shoutouts from famous personalities.
It proved a hit during the pandemic, with countless celebrities, from Floyd Mayweather to Brent Spiner, unable to attend in-person events, but now able to charge hundreds of dollars for a thirty-second video greeting.
Take Brian Baumgartner, best known as Kevin from The Office. He became the platform’s most requested name, charging $195 per video.
And now, users can request a message from an animated character, and none other than Ted Templeton, the boss baby.
"We’re extremely encouraged by the early results of Boss Baby and believe that AI can be one of the most scalable ways for fans to connect directly with their favorite characters— something that was previously impossible,” Arthur Leopold, chief business officer at Cameo, told AI Business.
“We expect to help bring more characters and IP to fans through personalized Cameo content in the near future.”
The technology behind the service comes from Hour One, a ‘video transformation’ company that creates synthetic characters.
Amir Konigsberg, founding director and board member at Hour One, spoke with AI Business to discuss the innovative idea, and many ways to apply it.
Cookies are for closers
Two weeks after it launched on the platform, Konigsberg said Cameo users “loved” the product. Priced at just $15, it costs around $100 less than the platform's only non-AI Baldwin, Alec’s brother Billy.
The sample video does sound like Alec Baldwin. It has the correct inflections one would place when uttering specific words, and even sounds like the actor breathing in between phrases.
The emotional range of the clip isn't as vast as what you can see in the trailer – but it's tough to gauge the effectiveness from a 15-second video.
Looking at the comments, there are a plethora of five stars – but not everyone is impressed. “Horrible. Just a bot with no customization. I put something in parentheses for clarification and it was still read. Total ripoff,” one review reads.
“Clearly prerecorded with a deepfake voice saying our names,” reads another.
Konigsberg said his team had "many" videos requested – though offered no specifics. He suggested users are enjoying the relatively low price point, and added, “We're delighted by this and we're excited at the quality that we're generating.”
Avoiding the uncanny valley
How does it work? Konigsberg explained that his team captures the facial and mouth movements of a person. The information is then fed to the company’s AI platform, which processes it, and then attempts to replicate the captured individual.
How much time does Hour One require to fully re-create someone? For example, Star Trek’s William Shatner spent five days with StoryFile for its conversational AI video project.
Accurate representation of voice requires around two to three hours of voice recordings, according to Andrea Hauser, an IT security consultant from scip.
Konigsberg said the length of the project depended on data. “It completely depends on the material. And the quality of data. If the data is good, you need less time,” he said.
“Assuming the subject is captured in good surroundings with high-quality video, based on that it can take a very, very short amount of time – under an hour, or sometimes, under thirty minutes.”
"Once we're able to cover the spectrum of different facial movements that we need, then we don't really need anything else."
For its content output to be considered high quality, Konigsberg explained that it has to look natural.
“To make it real, and make it high quality, you have to regenerate the face every time based on the type of voice that you're using.”
Cameo was a clear success: celebrities could earn money they'd make selling autographs at places like Comic-Con, without having to leave their mansions.
Is there a potential for the Hour One tech to be applied to humans? Could we purchase a computer-generated happy birthday message from Big Ed from 90 Day Finance?
Most definitely, according to Konigsberg, and it’s something the company has worked on.
“We have a platform called Reals, where you can choose from hundreds of characters and voices and input any text you want and the output is a video speaking that text which looks completely real and natural.”
“We can and have created many, many hundreds of thousands, even millions of videos based on human characters. [The Boss Baby] project is the first of its kind and it’s really exciting for us, NBC Universal, DreamWorks, and Cameo.”
And it’s not just movies: Konigsberg spoke of helping companies monetize video game characters and other digital assets – “once you do this through AI, you can maintain the core IP behind the character that you've built but actually replicate it and make it work in completely novel contexts.”
“Boss Baby is a great example of that because fans are sending videos to each other and they're personal.”