ChatGPT Can Now See, Hear, and Speak

An image of the OpenAI logo and the ChatGPT logo to illustrate an article about the new voice and image search functions that the AI chatbot has

OpenAI has just beefed up its flagship artificial intelligence offering by granting ChatGPT the ability to “see, hear, and speak” in a major new update.

The company announced in a blog post that they would be rolling out changes to the AI chatbot over the coming two weeks that vastly expand the range of ways users can interact with it.

Since its launch in November last year, ChatGPT has been a text-based generative AI service. Now, voice capabilities will enable you to speak to ChatGPT, which will then respond with vocalisations. Image options will also allow you to take a photo of something and show ChatGPT what you’re talking about.

“Voice and image give you more ways to use ChatGPT in your life,” the company writes in their blog.

“Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step by step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you”.

It seems the company is seeking to keep ahead of competitors like Google, Amazon, and Apple, who are all attempting to retrain their interactive ‘personal assistants’ on large language model AI systems. Google’s new ‘Google Lens’ feature works in a similar way, allowing users to search by inputting images or speech, except OpenAI has had a head-start on the underlying AI tech that makes it run well.

OpenAI has said that their new text-to-speech model is built using professional voice actors to create the five different voice options, and can create “human-like audio from just text and a few seconds of sample speech.” They’re also collaborating with Spotify to use the technology to translate podcasts into different languages, while keeping the same tone and sound of the original speaker’s voice.

However, the company is also keenly aware of the difficulties that they create every time they update their AI programmes. “These capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud,” they write. Because of these risks, they’re keeping the range of voice options limited.

In addition, their image-based search could be used to violate the privacy of individuals, assist in committing crimes, or simply give dangerous information in response. OpenAI, in attempting to mitigate these issues, have said they’ve left caps on the responses ChatGPT will make when presented with certain information.

It’s clear OpenAI is pushing its most famous creation towards a fully-fledged virtual assistant. While they’re trying their best to embed safety in its development, past experience has shown that real-world use makes that very difficult to completely account for.

The company has said that Plus and Enterprise subscribers will get access to the update in the next two weeks while other groups, including developers, will get to use them “soon after.”

Related: Aussie AI Regulations Are Coming to Platforms Like ChatGPT

Related: Work Smarter, Not Harder: How to Use ChatGPT to Make Money

Read more stories from The Latch and subscribe to our email newsletter.