ChatGPT Gets Eyes, Ears and a Voice

ChatGPT users can now get help by showing it images and talking to it - in a back-and-forth conversation.

September 25, 2023

3 Min Read

Getty Images

At a Glance

OpenAI announces new conversational voice and image capabilities for ChatGPT for more intuitive interactions.
Users can now speak to the AI chatbot and show it photos for assistance.

OpenAI has unveiled a major overhaul to ChatGPT, rolling out new voice and image capabilities that let the AI chatbot effectively see, hear and speak.

The Microsoft-backed company said the new capabilities offer a “more intuitive type of interface." The improved ChatGPT will let users take images and use them as prompts, if they wish to get more information about them. They can also highlight a part of the image if their question only refers to that portion.

Want to learn more about the Eiffel Tower? Simply take a photo of it and use the image as a prompt. Or need help with one question in a math homework? Take a picture of the worksheet, circle the problematic question and have ChatGPT help answer it.

ChatGPT users also can now interact with the chatbot using their voice. Get recipe ideas or ask for a bedtime story using your own voice as input and hear the results spoken back.

The voice and image options are coming to ChatGPT Plus and Enterprise users over the next two weeks. Voice is coming on iOS and Android; users need to opt-in via ‘settings.’ Image functionality is coming to all platforms.

OpenAI said it is giving access to voice and image options to developers “soon after” release, though offered no specific timeframe.

Understand images

ChatGPT’s new image functionality is powered by multimodal iterations of its GPT-3.5 and GPT-4 models.

To use the image option, upload one or several images and combine it with text prompts. To focus on a specific part, ChatGPT mobile users can circle it using a drawing tool.

Here is an example of a bicycle rider asking ChatGPT for help in lowering the bike seat. ChatGPT instructs the rider to find the quick-release lever or bolt.

“Like other ChatGPT features, vision is about assisting you with your daily life. It does that best when it can see what you see,” OpenAI said.

Use your voice

The new voice functionality allows users to have back-and-forth conversations with ChatGPT, which is a level up from capabilities currently offered by consumer-grade AI assistants like Siri, Alexa and Google Home.

A new unnamed text-to-speech model is powering the voice capability, which can generate human-like audio from just text and a few seconds of sample speech. OpenAI said it brought in professional voice actors to create each of the voices

OpenAI’s Whisper speech recognition model was also used to transcribe spoken words into text.

Users need to go to settings on their ChatGPT account and opt into voice conversations under the ‘new features tab.’ Users can also select their preferred voice out of five different voices.

OpenAI revealed it is working with streaming giant Spotify on its voice chat feature; Spotify is using its tech to power the automatic translation of podcast content.

Are ChatGPT’s new voice and image options safe to use?

OpenAI said it has taken measures to limit risks, including working with third parties to understand use cases and limitations, technical limits on analyzing people in images, transparency about model limitations and advising against high-risk use cases.

OpenAI said it has been testing its image capabilities with a group of red teamers, which stress-tested it across various risks like extremism and scientific inaccuracies. It has already been alpha-tested, which is an early stage, internal testing of a product prior to a beta test by a select group of targeted users.

Moreover, OpenAI said the new functionalities employ technical measures to "significantly limit" ChatGPT’s ability to analyze and make direct statements about people, since the chatbot "is not always accurate" and also should respect people's privacy. In June, OpenAI was sued after ChatGPT allegedly incorrectly accused gun rights advocate Mark Walters of "defrauding and embezzling funds" from a nonprofit in which he was the CFO. A journalist had asked ChatGPT to summarize a lawsuit filed by the nonprofit against Washington state Attorney General Robert Ferguson. Walters was not a party to this lawsuit.

OpenAI also acknowledged that ChatGPT “performs poorly with some other languages, especially those with non-roman script.”

“We advise our non-English users against using ChatGPT for this purpose,” the company said.

Stay updated. Subscribe to the AI Business newsletter.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

ChatGPT Gets Eyes, Ears and a Voice

At a Glance

Understand images

Use your voice

Are ChatGPT’s new voice and image options safe to use?

Stay updated. Subscribe to the AI Business newsletter.

About the Author(s)

Latest News

Trending articles