We already live in a world where virtual assistants can engage in a seamless (and even flirtatious) conversation with people. But Apple’s virtual assistant, Siri, struggles with some of the basics.
For example, I asked Siri when the Olympics take place this year and it quickly spit out the correct dates for the summer games. When I followed that up with, “Add it to my calendar,” the virtual assistant responded imperfectly with “What should I call it?” The answer to that question would be obvious to us humans. Apple’s virtual assistant was lost. Even when I responded, “Olympics,” Siri replied, “When should I schedule it for?”
Siri tends to falter since it lacks contextual awareness, which limits its ability to follow a conversation like a human can. That could change as early as June 10, the first day of Apple’s annual Worldwide Developers Conference (WWDC). The iPhone maker is expected to unveil major updates with its upcoming mobile operating system, likely to be called iOS 18, with significant changes reportedly in store for Siri.
Apple’s virtual assistant made waves when it debuted with the iPhone 4S back in 2011. For the first time, people could talk to their phones and receive a human-like response. Some Android phones offered basic voice search and voice actions before Siri, but those were more command-based and widely considered to be less intuitive.
As such, Siri represented a leap forward in voice-based interaction and laid the groundwork for subsequent voice assistants such as Amazon’s Alexa, Google Assistant and even OpenAI’s ChatGPT and Google’s Gemini chatbots.
Move over Siri, multimodal assistants are here
Although Siri impressed people with its voice-based experience in 2011, its capabilities are seen by some as lagging behind its peers. Alexa and Google Assistant are adept at understanding and answering questions, and both have expanded into smart homes in different ways than Siri has. It just seems that Siri has hasn’t lived up to its full potential — its rivals have received a similar criticism.
In 2024, it also faces a dramatically different competitive landscape that has been supercharged by generative AI. In recent weeks, OpenAI, Google and Microsoft have unveiled a new wave of futuristic virtual assistants with multimodal capabilities, which pose a competitive threat to Siri. According to NYU professor Scott Galloway on a recent episode of his podcast, these recent updates are poised to be the ‘Alexa and Siri killers.’
Earlier this month, OpenAI unveiled its latest AI model. The announcement underscored just how far virtual assistants have come. In its San Francisco demo, OpenAI showed off how GPT-4o could hold two-way conversations in even more human-like ways, complete with the ability to inflect tone, make sarcastic remarks, speak in whispers and even flirt. It quickly drew comparisons to Scarlett Johansson’s character in the 2013 Hollywood drama Her, in which a lonely writer falls in love with his female-sounding virtual assistant, voiced by Johansson. Following GPT-4o’s demo, the American actor accused OpenAI of building a virtual assistant voice that sounds “eerily similar” to her own — without her permission.
The controversy seemingly upstaged some GPT4o features like its native multimodal capabilities, which means that the AI model can understand and respond to inputs beyond text, encompassing pictures, spoken language, and even video. In practice, GPT-4o can chat with you about a photo you show (by uploading media), describe what’s happening in a video clip and discuss a news article.
Read More: Scarlett Johansson “Angered” Over OpenAI’s Chatbot Mimicking ‘Her’ Voice
The day after OpenAI’s preview, Google showed off its own multimodal demo, unveiling Project Astra — a prototype that the company has billed as the “future of AI assistants.” In a demo video, Google detailed how users can show Google’s virtual assistant their surroundings by using their smartphone’s camera and then proceed to discuss objects in their environment. For example, the person interacting with Astra at what was presumably Google’s London office asked Google’s virtual assistant to identify an object that makes a sound in the room. In response, Astra pointed out the speaker sitting on a desk.
Google’s Astra prototype can not only make sense of its surroundings but also remember details about it. When the narrator asked where they left their glasses, Astra was able to tell the user where they were last seen by responding with: “On the corner of the desk next to a red apple.”
The race to create flashy virtual assistants doesn’t end with OpenAI and Google. Elon Musk’s AI company, xAI, is making progress on turning its Grok chatbot into one with multimodal capabilities, according to public developer documents. In May, Amazon said it was working on giving Alexa, its decades-old virtual assistant, a generative AI upgrade.
Will Siri become multimodal?
Multimodal conversational chatbots currently represent the cutting edge of AI assistants, potentially offering a window into the future of how we navigate our phones and other devices.
Apple doesn’t have a digital assistant with multimodal capabilities to show for itself just yet, putting it behind the curve. The iPhone maker published research on the subject. In October, it introduced Ferret, a multimodal AI model that can understand what’s happening on your phone screen, and perform a range of tasks based on what it sees. In the paper, researchers explore how it can identify and report on what you’re looking at and help you traverse apps, among other capabilities. The research points to a possible future in which the way we use our iPhones and other devices, changes entirely.
Where Apple could stand out is in terms of privacy. The iPhone maker has long championed privacy as a core value when designing products and services, and it’ll bill the new version of Siri as a more private alternative to its competitors, according to the New York Times. Apple is expected to achieve this by processing Siri’s requests on-device and turning to the cloud for more complex tasks, but those will be processed in data centers with Apple-made chips, according to a Wall Street Journal report.
As for a chatbot, Apple is close to finalizing a deal with OpenAI to potentially bring ChatGPT to the iPhone, according to Bloomberg, in a possible indication that Siri will not be competing directly with ChatGPT or Gemini. Instead of doing things like writing poetry, Siri will hone in on tasks it can already do and get better at those, according to the New York Times.
How will Siri change? All eyes on Apple’s WWDC
Apple has been intentionally slow to come to market, typically preferring a wait-and-see approach regarding emerging technology. This strategy has often worked, but not always. For instance, the iPad wasn’t the first tablet, but for many, including CNET editors, it is the best tablet. On the other hand, Apple’s HomePod smart speaker only hit the market several years after the Amazon Echo and Google Home, but it never caught up to its rivals’ market share. A more recent example on the hardware side is foldable phones. Apple is the only major holdout. Every major rival, Google, Samsung, Honor, Huawei and even lesser-known companies such as Phantom, beat Apple to the punch.
Historically, Apple has taken the approach of updating Siri in intervals, says Avi Greengart, lead analyst at Techsponential.
“Apple has always been more programmatic about Siri than Amazon, Google or even Samsung,” said Greengart. Apple seems to add knowledge to Siri in bunches – sports one year, entertainment the next.”
With Siri, Apple is widely expected to play catchup rather than break new ground this year. Still, Siri will likely be a major focus of Apple’s upcoming operating system, iOS 18, which is rumored to bring fresh AI features. Apple is expected to show off further AI integrations into existing apps and features including Notes, emojis, photo editing, messages and emails, according to Bloomberg.
As for Siri, it’s tipped to evolve into a more intelligent digital helper this year. Apple is reportedly training its voice assistant on large language models to improve its ability to answer questions with more accuracy and sophistication, according to the October edition of Mark Gurman’s Bloomberg newsletter PowerOn.
The integration of large language models, as well as the technology behind ChatGPT, is poised to transform Siri into a more context-aware and powerful virtual assistant. It would enable Siri to understand more complex and nuanced questions but also provide accurate responses. This year’s iPhone 16 lineup is also expected to come with larger memory for supporting new Siri capabilities, according to the New York Times.
Read More: What is an LLM and How Does it Relate to AI Chatbots?
“My hope is that Apple can use generative AI to give Siri the ability to feel more like a thoughtful assistant that understands what you are trying to ask, but use data-based systems for answers that are data bound,” Greengart told CNET.
Siri could also improve at performing multi-step tasks. A September report by the Information detailed how Siri might respond to simple voice commands for more complex tasks, such as turning a set of photos into a GIF and then sending them to one of your contacts. That would be a significant step forward in Siri’s capabilities.
“Apple also defines how iPhone apps work, so it has the ability to allow Siri to work across apps with the developer’s permission – potentially opening up new capabilities for a smarter Siri to securely accomplish tasks on your behalf,” Greengart said.
Watch this: Apple’s AI at WWDC Will Take a Different Twist
17 Hidden iOS 17 Features You Should Definitely Know About
Editors’ note: CNET used an AI engine to help create several dozen stories, which are labeled accordingly. The note you’re reading is attached to articles that deal substantively with the topic of AI but are created entirely by our expert editors and writers. For more, see our AI policy.
+ There are no comments
Add yours