OpenAI unveils new ChatGPT that listens, watches and speaks

While Apple and Google are turning their voice assistants into chatbots, OpenAI is turning its chatbot into a voice assistant.

On Monday, the San Francisco artificial intelligence startup unveiled a new version of its ChatGPT chatbot capable of receiving and responding to voice commands, images and videos.

The company said the new app – based on an AI system called GPT-4o – juggles audio, images and video much faster than previous versions of the technology. The app will be available for free starting Monday on smartphones and desktop computers.

“We envision the future of interaction between us and machines,” said Mira Murati, the company’s chief technology officer.

The new app is part of a broader effort to combine conversational chatbots like ChatGPT with voice assistants like Google Assistant and Apple’s Siri. While Google is merging its Gemini chatbot with Google Assistant, Apple is preparing a new, more conversational version of Siri.

OpenAI said it would gradually share the technology with users “over the coming weeks.” This is the first time it offers ChatGPT as a desktop application.

The company previously offered similar technologies from various free and paid products. Now it has consolidated them into a single system available for all its products.

In a webcast event, Murati and her colleagues demonstrated the new app as she responded to conversational voice commands, used a live video feed to analyze math problems written on a sheet of paper and read to aloud playful stories she had written on the fly.

The new application cannot generate video. But it can generate still images that represent frames from a video.

With the launch of ChatGPT in late 2022, OpenAI has shown that machines can process queries more like people. In response to conversational text prompts, it could answer questions, write essays, and even generate computer code.

ChatGPT was not governed by a set of rules. He acquired his skills by analyzing huge amounts of text taken from the Internet, including Wikipedia articles, books, and chat logs. Experts have hailed the technology as a possible alternative to search engines like Google and voice assistants like Siri.

Newer versions of the technology have also learned from sounds, images and video. Researchers call this “multimodal AI.” Essentially, companies like OpenAI have started combining chatbots with AI image, audio, and video generators.

(The New York Times sued OpenAI and its partner Microsoft in December, alleging copyright infringement over news content related to AI systems.)

As businesses combine chatbots and voice assistants, many obstacles remain. Because chatbots learn their skills from Internet data, they are prone to errors. Sometimes they make up the information entirely – a phenomenon AI researchers call “hallucination.” These flaws migrate to voice assistants.

Although chatbots can generate compelling language, they are less adept at taking actions such as scheduling a meeting or booking an airline flight. But companies like OpenAI are working to turn them into “AI agents” that can reliably handle such tasks.

OpenAI previously offered a version of ChatGPT capable of accepting voice commands and responding by voice. But it was a patchwork of three different AI technologies: one that converted voice to text, one that generated a text response, and one that converted that text into synthetic voice.

The new app is based on a single AI technology – GPT-4o – capable of accepting and generating text, sounds and images. That means the technology is more efficient and the company can afford to offer it to users for free, Murati said.

“Before, you had all this latency that was the result of three models working together,” Ms. Murati said in an interview with The Times. “You want to have the experience that we have – where we can have this very natural dialogue.”

Related Posts