Google just announced Gemini Omni, a new AI model that it claims can “create anything from any input,” at its annual I/O developer conference on Tuesday.
The company said the model is starting off with just video generation and editing capabilities. On its website, Google says to think of it like “Nano Banana — but for video,” referencing the company’s image model that came out last year.
Gemini Omni Flash, the first model in the Omni family, can edit existing videos and generate new ones using plain-language prompts. It’s already available to try on the Gemini app, Google Flow AI studio, and YouTube Shorts.
“With Omni, you can combine images, audio, video and text as input and generate high-quality videos grounded in Gemini’s real-world knowledge. You can also easily edit your videos through conversation,” wrote Google DeepMind Chief Technology Officer Koray Kavukcuoglu in a blog post.
As with Nano Banana, users can make edits that build off each other through natural conversation. The model is designed to keep characters and environments consistent across edits and use its knowledge of the real world including history, biology, physics, and narrative logic to make clips that actually make sense.
The company has posted on its website several examples of what the model can do in practice.
In one example, Google starts off with a video of a man touching a mirror. The model then creates several different versions of the clip based on text prompts like “make the mirror ripple beautifully like liquid” and “the entire environment turns into 3d voxel art” when the mirror is touched.
Another example shows off the model’s audio capabilities. The video syncs the lights from an apartment building’s windows to a techno track.
The model was even able to create a short claymation-style explainer on protein folding.
But as with other video and image AI models, there are obvious concerns about abuse, including deepfakes and misinformation.
Google says the model was developed with input from its internal safety, security, and responsibility teams. The model also underwent a range of evaluations including testing with specialists outside the model development team to help ensure it follows safety policies and produces desired outcomes. Ethics and safety reviews were also conducted ahead of its release.
Additionally, Google says content created or edited with Omni will carry an invisible SynthID digital watermark which is meant to make it easier to verify whether content was generated using the model.