Google unveils Veo, a high-definition AI video generator that may rival Sora

Enlarge / Still images taken from videos generated by Google Veo.

Google/Benj Edwards

On Tuesday at Google I/O 2024, Google announced Veo, a new AI video synthesis model capable of creating HD videos from text, image, or video prompts, similar to OpenAI’s Sora . It can generate 1080p videos longer than a minute and edit videos from written instructions, but it has not yet been released for widespread use.

Veo would have the ability to edit existing videos using text commands, maintain visual consistency between frames, and generate video clips lasting up to 60 seconds and beyond from a single prompt or a series of prompts forming a story. The company claims it can generate detailed scenes and apply cinematic effects such as time-lapses, aerial shots and various visual styles.

Since the launch of DALL-E 2 in April 2022, we’ve seen a parade of new CG and CG templates that aim to enable anyone who can type a written description to create a detailed image or video . Although neither technology has been fully perfected, AI image and video generators are becoming more and more capable.

In February, we previewed OpenAI’s Sora video generator, which many believed at the time to represent the best AI video synthesis the industry has to offer. This impressed Tyler Perry enough that he put the expansion of his movie studio on hold. However, so far OpenAI has not provided general access to the tool; instead, they limited its use to a select group of testers.

Now, Google’s Veo appears at first glance to be capable of similar video generation feats to Sora. We haven’t tried it ourselves, so we can only rely on the hand-picked demo videos the company has provided on its website. This means that anyone viewing them should take Google’s claims with a huge grain of salt, as the generation results may not be typical.

Veo’s video examples include a cowboy riding a horse, a quick shoot on a suburban street, skewers roasting on a grill, a time-lapse of a sunflower opening, and much more. Detailed representations of humans are noticeably absent, something that has historically been difficult for AI image and video models to generate without obvious distortions.

Google says Veo builds on the company’s previous video generation models, including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet, and Lumiere. To improve quality and efficiency, Veo’s training data includes more detailed video captions and uses compressed “latent” video representations. To improve Veo’s video generation quality, Google included more detailed captions for videos used to train Veo, allowing the AI to interpret prompts more accurately.

Veo also seems notable in supporting moviemaking commands: “When given both a video input command and an editing command, such as adding kayaks to a shot of an aerial view of a coastline, Veo can apply this command to the initial video and create a new edited video,” the company says.

Although the demos look impressive at first glance (especially when compared to Will Smith eating spaghetti), Google acknowledges that AI video generation is difficult. “Maintaining visual consistency can be a challenge for video generation models,” the company writes. “Characters, objects, or even entire scenes may flicker, jump, or transform unexpectedly between frames, disrupting the viewing experience.”

Google has tried to mitigate these drawbacks with “state-of-the-art latent streaming transformers”, which is basically meaningless marketing talk without details. But the company is confident enough in the model to work with actor Donald Glover and his studio, Gilga, to create an AI-generated demo film that will debut soon.

Initially, Veo will be available to selected creators through VideoFX, a new experimental tool available on Google’s AI Test Kitchen website, labs.google. Creators can sign up for a waitlist for VideoFX to potentially access Veo features in the coming weeks. Google plans to integrate some of Veo’s features into YouTube Shorts and other products in the future.

It’s not yet clear where Google obtained the training data for Veo (if we had to guess, YouTube was probably involved). But Google claims to adopt a “responsible” approach with Veo. According to the company, “Videos created by Veo are watermarked using SynthID, our industry-leading tool for watermarking and identifying AI-generated content, and are passed through security filters and memorability verification processes. that help mitigate privacy, copyright and bias risks. »

News Source : arstechnica.com
Gn tech

Related Posts

Elon Musk and Donald Trump are smack talking each other into their own digital echo chambers

Alphabet CEO Sundar Pichai dismisses AI job fears, emphasizes expansion plans

One of Africa’s most successful founders is back with a new AI startup and already raised $9M