23. Zeroscope

There are a few new models being created off the ModelScope prototype, including zeroscope_v2, a family of open-source models. This creates higher-quality video than ModelScope, and the XL version can upscale it to 1024×576 resolution. It’s available on HuggingFace.
zeroscope_v2 XL, A watermark-free Modelscope-based video model capable of generating high quality video at 1024 x 576
Model on @huggingface : https://t.co/OK7IutQtE7
This model was trained with offset noise using 9,923 clips and 29,769 tagged frames at 24 frames, 1024×576… pic.twitter.com/K2jJS9N9KB
— AK (@_akhaliq) June 24, 2023
Generating video from the model can take quite a long time if you’re using somebody else’s public space, but the results are pretty interesting just by themselves.
Despite advertising as much more capable than others, text-to-video is still in its very early stages. Give it a simple prompt like “Man walking through the forest,” and you’ll receive a rather simple rendition of what you asked. Give it anything more esoteric or imprecise, and it will spit out some pretty wild visuals. To be clear, I prefer some more stylistically strange, but people morphing into each other isn’t quite what I intended with a prompt.