A pair of Meta glasses takes a photo when you say, “Hey, Meta, take a photo.” A miniature computer that clips onto your shirt, the Ai Pin, translates foreign languages into your native language. An artificially intelligent screen has a virtual assistant that you speak through a microphone.
Last year, OpenAI updated its ChatGPT chatbot to respond with spoken words, and recently, Google introduced Gemini, a replacement for its voice assistant on Android phones.
Tech companies are betting on a renaissance of voice assistants, several years after most people decided that talking to computers wasn’t cool.
Will it work this time? Maybe, but it could take a while.
Many people have still never used voice assistants like Amazon’s Alexa, Apple’s Siri, and Google’s Assistant, and the overwhelming majority of those who have said they never want to be seen speaking to them in public, according to studies donated over the last decade.
I, too, rarely use voice assistants, and in my recent experience with Meta’s glasses, which include a camera and speakers to provide information about your surroundings, I concluded that talking to a computer in front parents and their children in a zoo was always incredibly awkward.
I wondered if this would ever feel normal to me. Not long ago, talking on the phone with Bluetooth headsets made people seem crazy, but now everyone does it. Will we ever see lots of people walking around and talking to their computers like in science fiction movies?
I asked this question of design experts and researchers, and the consensus was clear: As new AI systems improve the ability of voice assistants to understand what we say and actually help us, we We are likely to talk to devices more often in the short term. the future – but it will still be many years before we do it in public.
Here’s what you need to know.
Why voice assistants are getting smarter
New voice assistants are powered by generative artificial intelligence, which uses complex statistics and algorithms to guess which words go together, much like your phone’s autocomplete feature. This makes them better able to use context to understand requests and follow-up questions than virtual assistants like Siri and Alexa, which can only answer a limited list of questions.
For example, if you say to ChatGPT: “What are the flights from San Francisco to New York next week?” » – and continue with “What’s the weather like there?” and “What should I bring?” » — the chatbot can answer these questions because it makes connections between words to understand the context of the conversation. (The New York Times sued OpenAI and its partner Microsoft last year for using copyrighted news articles without permission to train chatbots.)
An older voice assistant like Siri, which responds to a database of commands and questions it was programmed to understand, would fail unless you used specific words, including “What’s the weather like in New York? and “What should I pack for a trip to New York?”
The first conversation seems smoother, like the way people talk to each other.
One of the main reasons people abandoned voice assistants like Siri and Alexa was that computers couldn’t understand much of what was asked of them — and it was difficult to know which questions worked.
Dimitra Vergyri, director of voice technology at SRI, the research lab behind the initial version of Siri before it was acquired by Apple, said generative AI solved many of the problems researchers were grappling with For years. The technology makes voice assistants able to understand spontaneous speech and respond with helpful answers, she said.
John Burkey, a former Apple engineer who worked on Siri in 2014 and has been an outspoken critic of the assistant, said he believes that because generative AI makes it easier for people to get help since computers, more of us have been talking about assistants – and that when enough of us start doing it, it could become the norm.
“Siri was limited in size – he only knew a limited number of words,” he said. “You have better tools now.”
But it may be years before the new wave of AI assistants is widely adopted, as it introduces new problems. Chatbots, including ChatGPT, Google’s Gemini, and Meta AI, are prone to “hallucinations,” which is when they make things up because they can’t come up with the right answers. They blundered in basic tasks like counting and summarizing information from the web.
When voice assistants help – and when they don’t
Even as voice technology improves, speaking is unlikely to replace or supplant traditional computer interactions with a keyboard, experts say.
People currently have compelling reasons to talk to computers in certain situations when they are alone, such as setting a destination on a map while driving a car. In public, however, not only can talking to an assistant always make you look weird, but more often than not, it’s impractical. When I wore the Meta glasses to a grocery store and asked them to identify a product, an indiscreet shopper cheekily responded, “It’s a turnip.” »
You also wouldn’t want to dictate a confidential business email to other people on a train. Likewise, it would be inconsiderate to ask a voice assistant to read text messages aloud in a bar.
“Technology solves a problem,” said Ted Selker, a product design veteran who worked at IBM and Xerox PARC. “When do we solve problems and when do we create problems? »
Still, it’s easy to imagine times when talking to a computer helps you so much that you don’t care how strange it seems to others, said Carolina Milanesi, an analyst at Creative Strategies, a research firm.
On your way to your next office meeting, it would be helpful to ask a voice assistant to give you a debrief on the people you were going to meet. When hiking on a trail, asking a voice assistant where to turn would be faster than stopping to view a map. When visiting a museum, it would be great if a voice assistant could give a history lesson about the painting you’re looking at. Some of these applications are already being developed with new AI technology.
When I tested some of the latest voice products, I got a glimpse of that future. By recording a video of myself baking a loaf of bread and wearing the Meta glasses, for example, it was helpful to be able to say, “Hey, Meta, shoot a video,” because I had hands full. And having Humane’s Ai Pin dictate my to-do list was more convenient than stopping to stare at my phone screen.
“As you’re walking around, that’s the sweet spot,” said Chris Schmandt, who worked on voice interfaces for decades at the Massachusetts Institute of Technology’s Media Lab.
When he became an early adopter of one of the first cell phones about 35 years ago, he says, people stared at him as he rode around the MIT campus on the train. to talk on the phone. Now it’s normal.
I have no doubt that a day will come when people will occasionally talk to computers while on the move – but it will come very slowly.