9 biggest announcements at Google I/O 2024: Gemini, Search, Project Astra, and more

Google has had an eventful year already, rebranding its AI chatbot from Bard to Gemini and releasing several new AI models. At this year’s Google I/O developer conference, the company made several more announcements regarding AI and how it’ll be embedded across the company’s various apps and services.

Also: How to sign up for Google Labs (and 5 reasons why you should)

As expected, AI took center stage at the event, with the technology being infused across nearly all of Google products, from Search, which has remained mostly the same for decades, to Android 15 to, of course, Gemini. Here’s a roundup of every major announcement made at the event so far. And stay tuned for the latest updates.

1. Gemini

It wouldn’t be a Google developer event if the company didn’t unveil at least one new large language model (LLM), and this year, the new model is Gemini 1.5 Flash. This model’s appeal is that it is the fastest Gemini model served in the API and a more cost-efficient alternative than Gemini 1.5 Pro while still highly capable. Gemini 1.5 Flash is available in public preview in Google’s AI studio and Vertex AI starting today.

Even though Gemini 1.5 Pro was just launched in February, it has been upgraded to provide better-quality responses in many different areas, including translation, reasoning, coding, and more. Google shares that the latest version has achieved strong improvements on several benchmarks, including MMMU, MathVista, ChartQA, DocVQA, InfographicVQA, and more.

Also: Google I/O 2024: 5 Gemini features that would pull me away from Copilot

Furthermore, Gemini 1.5 Pro, with its 1 million context window, will be available for consumers in Gemini Advanced. This is significant because it will allow consumers to get AI assistance on large bodies of work, such as PDFs that are 1,500 pages long.

Google

As if that context window wasn’t already large enough, Google is previewing a two million context window in Gemini 1.5 Pro and Gemini 1.5 Flash to developers through a waitlist in Google AI Studio.

Also: The best AI chatbots: ChatGPT and alternatives

Gemini Nano, Google’s model designed to run on smartphones, has been expanded to include images in addition to text. Google shares that starting with Pixel, applications using Gemini Nano with Multimodality will be able to understand sight, sound, and spoken language.

The Gemini sister family of models, Gemma, is also getting a major upgrade with the launch of Gemma 2 in June. The next generation of Gemma has been optimized for TPUs and GPUs and is launching at 27B parameters.

Lastly, PaliGemma, Google’s first vision-language model, is also being added to the Gemma family of models.

2. Google Search

If you have opted into the Search Generative Experience (SGE) via Search Labs, you are familiar with the AI overview feature, which populates AI insights at the top of your search results to give users conversational, abridged answers to their search queries.

Now, using that feature will no longer be limited to Search Labs, as it is being made available to everyone in the U.S. starting today. The feature is made possible by a new Gemini model, customized for Google Search.

According to Google, since AI overviews were made available through Search Labs, the feature has been used billions of times, and it has caused people to use Search more and be more satisfied with their results. The implementation into Google Search is meant to provide a positive experience for users, and only appear when it can add to Search results.

Also: The 4 biggest Google Search features announced at Google I/O 2024

Another significant change coming to Search is an AI-organized results page that uses AI to create unique headlines to better suit the user’s search needs. AI-organized search will begin to roll out to English-language searches in the U.S. related to inspiration, starting with dining and recipes, then movies, music, books, hotels, shopping, and more, according to Google.

Google is also rolling out new Search features that will first be launched in Search Labs. For example, in Search Labs, users will soon be able to adjust their AI overview to best suit their preferences, with options to break down information further or simplify the language, according to Google.

Users will also be able to use video to search, taking visual searches to the next level. This feature will be available soon in Search Labs in English. Lastly, Search can plan meals and trips with you starting today in Search Labs, in English, in the U.S.

3. Veo (text-to-video generator)

Google isn’t new to text-to-video AI models, having just shared a research paper on its Lumiere model in January. Now, the company is unveiling its most capable model to date, Veo, which can generate high-quality 1080p resolution video lengths beyond a minute.

The model can better understand natural language to generate video that more closely represents the user’s vision, according to Google. It also understands cinematic terms like “timelapse” to generate video in various styles and give users more control over the final output.

Also: Meet Veo, Google’s most advanced text-to-video generator, unveiled at Google I/O 2024

Google shares that it does build on years of generative video work, including Lumiere and other prevalent models such as Imagen-Video, VideoPoet, and more. The model is not yet available for users; however, it is available for select creators as a private preview inside VideoFX, and the public is invited to join a waitlist.

This video generator seems to be Google’s answer to Open AI’s text-to-image model, Sora, which is also not yet widely available and in private preview to red teamers and a select number of creatives.

4. Imagen 3

Google also unveiled its next-generation text-to-image generator, Imagen 3. According to Google, this model produces the highest quality images yet, with more details and fewer artifacts in images to help create more realistic images.

Like Veo, Imagen 3 has improved natural language capabilities to better understand user prompts and the intention behind them. This model can tackle one of the biggest challenges for AI image generators, text, with Google saying Imagen 3 is the best for rendering it.

Also: The best AI image generators: Tested and reviewed

Imagen 3 is not widely available just yet, available in private preview inside Image FX for select creators. The model will be available soon in Vertex AI, and the public can sign up to join a waitlist.

5. SynthID updates

In the era of generative AI we are in now, we are seeing companies focus on the multimodality of AI models. To make its AI-labeling tools fit accordingly, Google is now expanding its SynthID, Google’s technology that watermarks AI images, to two new modalities –text and video. Furthermore, Google’s new text-to-video model, Veo, will include SynthID watermarks on all videos generated by the platform.

6. Ask Photos

If you have ever spent what felt like hours scrolling through your feed to find the picture you are searching for, Google unveiled an AI solution to your problem. Using Gemini, users can use conversational prompts in Google Photos to find the image they are looking for.

Also: Google’s new ‘Ask Photos’ AI solves a problem I have every day

In the example, Google gave, a user wants to see their daughter’s progress as a swimmer over time, so they ask Google Photos that question, and it automatically packages the highlights for them. This feature is called Ask Photos, and Google shares that it will roll it out later this summer with more capabilities to come.

7. Gemini Advanced upgrades (featuring Gemini Live)

In February, Google launched a premium subscription tier to its chatbot, Gemini Advanced, which granted users access to bonus perks such as access to Google’s latest AI models and longer conversations. Now, Google is upgrading its subscribers’ offerings even further with unique experiences.

Also: What is Gemini Live? A first look at Google’s new real-time voice AI bot

The first, as mentioned above, is access to Gemini 1.5 Pro, which grants users access to a much larger context window of one million tokens, which Google says is the largest of any widely available consumer chatbot on the market. That larger window can be leveraged to upload larger materials, such as documents of up to 1,500 pages or 100 emails. Soon, it will be able to process an hour of video and codebases with up to 30,000 lines.

Next, one of the most impressive features of the entire launch is Google’s Gemini Live, a new mobile experience in which users can have full conversations with Gemini, choosing from a variety of natural-sounding voices and interrupting it mid-conversation.

AI Agents - Project Astra Google IO — Kerry Wan/ZDNET

Later this year, users will also be able to use their camera with Live, giving Gemini context of the world around them for those conversations. Gemini uses video understanding capabilities from Project Astra, a project from Google DeepMind meant to reshape the future of AI assistants. For example, the Astra demo showed a user pointing out the window and asking Gemini what neighborhood they were likely in from what they saw.

Gemini Live is essentially Google’s take on OpenAI’s new Voice Mode in ChatGPT, which the company announced at its Spring Updates event yesterday, through which users can also carry out full-blown conversations with ChatGPT, interrupting mid-sentence, changing the chatbot’s tone, and using the user’s camera as context.

Taking another page from OpenAI’s book, Google is introducing Gems for Gemini, which accomplishes the same goal as ChatGPT’s GPTs. With Gems, users can create custom versions of Gemini to suit different purposes. All a user needs to do is share the instructions of what task it wants the chatbot to accomplish, and Gemini will create a Gem that suits that purpose.

Also: How to use ChatGPT (and what you can use it for)

In the upcoming months, Gemini Advanced will also include a new planning experience that can help users get detailed plans that take into account their own preferences, going beyond just generating an itinerary.

For example, with this experience, Google says Gemini Advanced could create an itinerary that fits the multi-stepped prompt, “My family and I are going to Miami for Labor Day. My son loves art, and my husband really wants fresh seafood. Can you pull my flight and hotel info from Gmail and help me plan the weekend?”

Lastly, users will soon be able to connect more Extensions into Gemini, including Google Calendar, Tasks, and Keep, allowing Gemini to do tasks within each one of those applications, such as taking a photo of a recipe you took and adding it your Keep as a shopping list, according to Google.

8. AI upgrades to Android

Several of today’s earlier announcements eventually (and unsurprisingly) trickled down to Google’s mobile platform, Android. To start, Circle to Search, which lets users perform a Google search by circling images, videos, and text on their phone screen, can now “help students with homework” (read: it can now walk you through equations and math problems when you circle them). Google says the feature will work with topics ranging from math to physics, and will eventually be able to process complex problems like symbolic formulas, diagrams, and more.

Also: The best Android phones to buy in 2024

Gemini will also replace Google Assistant, becoming the default AI assistant across Android phones and accessible with a long press of the power button. Eventually, Gemini will be overlayed across various services and apps, providing multimodal support when requested. Gemini Nano’s multimodal capabilities will also be leveraged through Android’s TalkBack feature, providing more descriptive responses for users who experience blindness or low vision.

Lastly, if you do accidentally pick up a spam call, Gemini Nano can listen in and detect suspicious conversation patterns and notify you to either “Dismiss & continue” or “End call.” The feature can be opted into later this year.

9. Gemini for Google Workspace updates

With all of the Gemini updates, Google Workspace couldn’t be left without an AI upgrade of its own. For starters, the Gemini side panel of Gmail, Docs, Drive, Slides, and Sheets will be upgraded to Gemini 1.5 Pro.

This is significant because, as discussed above, Gemini 1.5 Pro gives users a longer context window and more advanced reasoning, which users can now take advantage of within the side panel of some of the most popular Google Workspace apps for upgraded assistance.

Google Workspace updated side panel — Google

This experience is now available for Workspace Labs and Gemini for Workspace Alpha users. Gemini for Workspace add-on and Google One AI Premium Plan users can expect to see it next month on desktop.

Gmail for mobile will now have three new helpful features: summarize, Gmail Q&A, and Contextual Smart Reply. The Summarize feature does exactly what its name implies — it summarizes an email thread leveraging Gemini. This feature is coming to users starting this month.

Also: Google just teased AR smart glasses, and you can already see how the software works

The Gmail Q&A feature allows users to chat with Gemini about the context of their emails within the Gmail mobile app. For example, in the demo, the user asked Gemini to compare roofer repair bids by price and availability. Gemini then pulled the information from several different inboxes and displayed it for the user, as seen in the image below.

Contextual Smart Reply is a smarter auto-reply feature that compiles a reply using the contexts of the email thread and Gemini chat. Both Gemail Q&A and Contextual Smart Reply will roll out to Labs users in July.

Lastly, the Help Me Write feature in Gmail and Docs is getting support for Spanish and Portuguese, coming to desktop in the coming weeks.

FAQs

When is Google I/O?

Google’s annual developer conference is here, taking place on May 14 and 15 at the Shoreline Amphitheatre in Mountain View, California. The opening day keynote, when Google leaders take the stage to unveil the company’s latest hardware and software, will begin at 10 AM PT / 1 PM ET.

How to watch Google I/O

Google will livestream the event on its main website and YouTube for members of the public and the press. You can register for the event on the Google I/O landing page for free to take advantage of perks such as receiving email updates and watching on-demand sessions. There will be an in-person element to I/O too, as has been the case for the past two years, with media and developers invited to attend. ZDNET will be among the crowd in Mountain View.

Source link