Google, OpenAI New AI Agents Shift Focus to Consumers
Omdia analysts note a strategic shift as OpenAI and Google unveil consumer-focused AI agents at recent events
In the space of just two days, Google and OpenAI showcased what could be the next form factor for AI, marking a move away from static text interfaces: the AI agent.
At its annual I/O event, Google unveiled Project Astra, an ongoing research effort where DeepMind engineers are attempting to build a universal AI agent or an assistant tool that can perform multiple tasks.
Astra works on a smartphone allowing the underlying AI to analyze camera feed footage to answer questions on objects in the room around the user, generating a response to queries through natural-sounding audio.
However, OpenAI narrowly edged out Google just 24 hours earlier showing off the new look of ChatGPT which has gone from a text-based interface to an assistant tool you can collaborate with using your smartphone or desktop.
ChatGPT can now respond to queries in a natural-sounding voice, processing a response in milliseconds. App users can point ChatGPT at an object, like a piece of text or a drawing and interact with the chatbot about the contents, like in the live demo where they asked about simple crudely drawn math problems.
And unlike Google’s research project, OpenAI is bringing its AI agent to consumers in the near term, rolling out early features this week.
The catalyst behind the new form factor is progress in foundation multimodal models.
Google’s agent is powered by its flagship Gemini 1.5 Pro, leveraging innovations created by the DeepMind team from research spanning a variety of modalities, including video and image, that enable Astra to interact and understand objects around the user.
The new ChatGPT, meanwhile, makes use of GPT-4o — a new foundation model that processes faster and has improved reasoning capabilities compared to GPT-4.
These breakthroughs enable foundation model makers to employ a more mature market approach, focusing on consumer integrations rather than the loftier goal of developing artificial general intelligence (AGI) or an AI that thinks like a human.
Why push back AGI research in favor of consumers? According to Bradley Shimmin, chief analyst, AI and data analytics at research firm Omdia, the answer is akin to the browser wars in the late ‘90s.
“By pushing generative AI into the basic operations of a phone, into the sidebar in a spreadsheet and of course into the web itself for search and increasingly action-oriented tasks, frontier model makers like Google will gain access to what is still the greatest commodity in the technology market, namely customer data,” Shimmin said.
“Only this time, instead of just using that data to drive ad revenue, user interactions (queries, conversations, prompts, etc.) will drive future value for model makers like Google by helping them build better, more capable models using actual, non-synthetic data.”
Eden Zoller, Omdia chief analyst of Applied AI said the relaxed vibe and reassuring promises from OpenAI’s launch were a “public relations masterclass, but the firm needs to walk the responsible AI talk.”
“GPT-4o’s support for multimodal real-time vision and audio (joining text) opens avenues to innovation and new user experiences, but also new challenges for safety, data privacy and potential abuse,” Zoller said. “Positioning GPT-4o as a friendly, helpful personal companion is designed to foster trust but this could encourage over-reliance and unquestioning confidence in what is generated, which could be harmful if information or advice is wrong, inaccurate, or inappropriate.
“OpenAI will need to stay in front of this with robust guardrails and proceed with caution.”
Shimmin also picked up on the emotion-based interactions showcased in Google’s Astra demos, highlighting it as the industry’s shift toward AI being “more than a helper, instead serving as a friend and confidant.”
“Ironically, prompt engineering research has shown that when models are queried using emotionally charged phrases such as ‘This task may save my job’ or ‘I'll pay you extra for doing a good job,’ researchers found that model performance improved dramatically,” Shimmin said. “This matches new functionality demonstrated by OpenAI with its GPT-4o release earlier this week.”
Not Quite GPT-5
When OpenAI announced it was unveiling a new model, the AI community's excitement led many to believe GPT-5 was about to be launched.
Instead, according to Alexander Harrowell, Omdia's principal analyst for advanced computing, GPT-4o is a “dot release with incremental improvements.”
“It’s interesting what [OpenAI] prioritized,” Harrowell said. “Multimodal is big at the moment and it’s not surprising they’ve concentrated on improving it but there was a lot of interesting detail in the announcement — they fessed up that GPT-4V wasn’t a natively multimodal model and was instead a pipeline of different models.”
The AI agent is not a new concept — AI luminary and Meta chief AI scientist Yann LeCun believes the agent, or assistant, will mediate every digital interaction humans will have in the future.
It’s an area OpenAI has been exploring for some time. In February, its agent development efforts began to emerge. Add to that CEO Sam Altman’s repeatedly expressing frustration with GPT-4’s capabilities, wanting an AI system that can do more.
Available to All
Both Google and OpenAI have eyes beyond businesses, showcasing their AI agents performing tasks consumers would want to use them for.
OpenAI used its Spring Update to announce these features would be available to every user. Previously some features were locked behind a paywall.
Shimmin said this change will only benefit OpenAI further.
“This approach will help OpenAI gather invaluable data that can be used directly to build future models and indirectly to drive revenue for the company — a move that makes OpenAI look more like Google the search engine provider than OpenAI the generative AI model maker,” he said.
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like