Google targets filmmakers with Veo, its new generative AI video model

It’s been three months since OpenAI demoed its captivating text-to-video AI, Sora, and now Google is trying to steal some of that spotlight. Announced during its I/O developer conference on Tuesday, Google says Veo — its latest generative AI video model — can generate “high-quality” 1080p resolution videos over a minute in length in a wide variety of visual and cinematic styles.

Veo has “an advanced understanding of natural language,” according to Google’s press release, enabling the model to understand cinematic terms like “timelapse” or “aerial shots of a landscape.” Users can direct their desired output using text, image, or video-based prompts, and Google says the resulting videos are “more consistent and coherent,” depicting more realistic movement for people, animals, and objects throughout shots.

Here are a few examples, but ignore the low resolution if you can — we had to compress the demo videos into GIFs.

Image: Google

Google DeepMind CEO Demis Hassabis said in a press preview on Monday that video results can be refined using additional prompts and that Google is exploring additional features to enable Veo to produce storyboards and longer scenes.

As is the case with many of these AI model previews, most folks hoping to try Veo out themselves will likely have to wait a while. Google says it’s inviting select filmmakers and creators to experiment with the model to determine how it can best support creatives and will build on these collaborations to ensure “creators have a voice” in how Google’s AI technologies are developed.

You can see here how the sun correctly reappears behind the horse and how the light softly shines through its tail.

Image: Google

Some Veo features will also be made available to “select creators in the coming weeks” in a private preview inside VideoFX — you can sign up for the waitlist here for an early chance to try it out. Otherwise, Google is also planning to add some of its capabilities to YouTube Shorts “in the future.”

This is one of several video generation models that Google has produced over the last few years, from Phenaki and Imagen Video — which produced crude, often distorted video clips — to the Lumiere model it showcased in January of this year. The latter was one of the most impressive models we’d seen before Sora was announced in February, with Google saying Veo is even more capable of understanding what’s in a video, simulating real-world physics, rendering high-definition outputs, and more.

Meanwhile, OpenAI is already pitching Sora to Hollywood and planning to release it to the public later this year, having previously teased back in March that it could be ready in “a few months.” The company is also already looking to incorporate audio into Sora and may make the model available directly within video editing applications like Adobe’s Premiere Pro. Given Veo is also being pitched as a tool for filmmakers, OpenAI’s head start could make it harder for Google’s project to compete.

Source link