a prompt describing characters and an environment, and suggest dialogue with a description of how you want it to sound.”
The wide availability of tools to build video generators has led to such an explosion of providers that the space is becoming saturated. Startups including Runway, Lightricks, Genmo, Pika, Higgsfield, Kling, and Luma, as well as tech giants such as OpenAI and Alibaba, are releasing models at a fast clip. In many cases, little distinguishes one model from another.
Audio output stands to be a big differentiator for Veo 3, if Google can deliver on its promises. AI-powered sound-generating tools aren’t novel, nor are models to create videosoundeffects. But Veo 3 uniquely can understand the raw pixels from its videos and sync generated sounds with clips automatically, per Google.
Here’s a sample clip from the model:
Join us at TechCrunch Sessions: AI
Secure your spot for our leading AI industry event with speakers from OpenAI, Anthropic, and Cohere. For a limited time, tickets are just $292 for an entire day of expert talks, workshops, and potent networking.
Exhibit at TechCrunch Sessions: AI
Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you’ve built — without the big spend. Available through May 9 or while tables last.
REGISTER NOW
Veo 3 was likely made possible by DeepMind’s earlier work in “video-to-audio” AI. Last June, DeepMind revealed that it was developing AI tech to generate soundtracks for videos by training a model on a combination of sounds and dialogue transcripts as well as video clips.
DeepMind won’t say exactly where it sourced the content to train Veo 3, but YouTube is a strong possibility. Google owns YouTube, and DeepMind previously told TechCrunch that Google models like Veo “may” be trained on some YouTube material.
To mitigate the risk of deepfakes, DeepMind says it’s using its proprietary watermarking technology, SynthID, to embed invisible markers into frames Veo 3 generates.
While companies like Google pitch Veo 3 as powerful creative tools, many artists are understandably wary of them — they threaten to upend entire industries. A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, estimates that more than 100,000 U.S.-based film, television, and animation jobs will be disrupted by AI by 2026.
Google also today rolled out new capabilities for Veo 2, including a feature that lets users give the model images of characters, scenes, objects, and styles for better consistency. The latest Veo 2 can understand camera movements like rotations, dollies, and zooms, and it allows users to add or erase objects from videos or broaden the frames of clips to, for example, turn them from portrait into landscape.
Google says that all of these new Veo 2 capabilities will come to its Vertex AI API platform in the coming weeks.