Meet Gemini AI Live: Like FaceTiming with a friend who knows everything

Estimated read time 4 min read


AI Agents - Project Astra Google IO

Google teases advancements in Gemini’s multi-modal AI capability.

Kerry Wan/ZDNET

At its much-anticipated annual I/O event, Google announced some exciting functionality to its Gemini AI model, particularly its multi-modal capabilities, in a pre-recorded video demo. 

Although it sounds a lot like the “Live” feature on Instagram or TikTok, Live for Gemini refers to the ability for you to “show” Gemini your view via your camera, and have a two-way conversation with the AI in real time. Think of it as FaceTiming with a friend who knows everything about everything.  

Also: Everything announced at Google I/O 2024: Gemini, Search, Project Astra, and more

This year has seen this kind of AI technology appear in a host of other devices like the Rabbit R1 and the Humane AI pin, two non-smartphone devices that came out this spring to a flurry of hopeful curiosity, but ultimately didn’t move the needle away from the supremacy of the smartphone. 

Now that these devices had their moments in the sun, Google’s Gemini AI has taken the stage with its snappy, conversational multi-modal AI and brought the focus squarely back to the smartphone. 

Google teased this functionality the day before I/O in a tweet that showed off Gemini correctly identifying the stage at I/O, then giving additional context to the event and asking follow-up questions of the user. 

In the demo video at I/O, the user turns on their smartphone’s camera and pans around the room, asking Gemini to identify its surroundings and provide context on what it sees. Most impressive was not simply the responses Gemini gave, but how quickly the responses were generated, which yielded that natural, conversational interaction Google has been trying to convey.   

Also: 3 new Gemini Advanced features unveiled at Google I/O 2024

The goals behind Google’s so-called Project Astra are centered around bringing this cutting edge AI technology down to the scale of the smartphone; that’s in part why, Google says, it created Gemini with multi-modal capabilities from the beginning. But getting the AI to respond and ask follow-up questions in real time has apparently been the biggest challenge. 

During its R1 launch demo in April, Rabbit showed off similar multimodal AI technology that many lauded as an exciting feature. Google’s teaser video proves the company has been hard at work in developing similar functionality for Gemini that, from the looks of it, might even be better.

Also: What is Gemini Live? A first look at Google’s new real-time voice AI bot

Google isn’t alone with multi-modal AI breakthroughs. Just a day earlier, OpenAI showed off its own updates during its OpenAI Spring Update livestream, including GPT-4o, its newest AI model that now powers ChatGPT to “see, hear, and speak.” During the demo, presenters showed the AI a various objects and scenarios via their smartphones’ cameras, including a math problem written by hand, and the presenter’s facial expressions, with the AI correctly identifying these things through a similar conversational back-and-forth with its users.

Also: Google’s new ‘Ask Photos’ AI solves a problem I have every day

When Google updates Gemini on mobile later this year with this feature, the company’s technology could jump to the front of the pack in the AI assistant race, particularly with Gemini’s exceedingly natural-sounding cadence and follow-up questions. Although the exact breadth of capabilities are yet to be fully seen, this development positions Gemini as perhaps the most well-integrated multi-modal AI assistant. 

Folks that attended Google’s I/O event in person had a chance to demo Gemini’s multi-modal AI for mobile in a controlled “sandbox” environment at the event, but we can expect more hands-on experiences later this year.





Source link

You May Also Like

More From Author

+ There are no comments

Add yours