Text to Video Generative AI Is Finally Here and It’s Weird as Hell

I like my AI like I like my foreign cheese varieties, incredibly weird and full of holes, the kind that leaves most definitions of “good” up to individual taste. So color me surprised as I explored the next frontier of public AI models, and found one of the strangest experiences I had since the bizarre AI-generated Seinfeld knockoff Nothing, Foreverwas first released.

Runway, one of the two startups that helped give us the AI art generator Stable Diffusion, announced on Monday that its first public test for its Gen-2 AI video model was going live soon. The company made the stunning claim it was the “first publicly available text-to-video model out there.” Unfortunately, a more obscure group with a much jankier initial text-to-video model may have beat Runway to the punch.

Google and Meta are already working on their own text-to-image generators, but neither company has been very forthcoming on any news since they were first teased. Since February, the relatively small 45-person team at Runway has been known for its online video editing tools, including its video-to-video Gen-1 AI model that could create and transform existing videos based on text prompts or reference images. Gen-1 could transform a simple render of a stick figure swimming into a scuba diver, or turn a man walking on the street into a claymation nightmare with a generated overlay. Gen-2 is supposed to be the next big step up, allowing users to create 3-second videos from scratch based on simple text prompts. While the company has not let anybody get their hands on it yet, the company shared a few clips based on prompts like “a close up of an eye” and “an aerial shot of a mountain landscape.”

Generate videos with nothing but words. If you can say it, now you can see it.

Introducing, Text to Video. With Gen-2.

Learn more at https://t.co/PsJh664G0Q pic.twitter.com/6qEgcZ9QV4

— Runway (@runwayml) March 20, 2023

Few people outside the company have been able to experience Runway’s new model, but if you’re still hankering for AI video generation, there’s another option. The AI text to video system called ModelScope was released over the past weekend and already caused some buzz for its occasionally awkward and often insane 2-second video clips. The DAMO Vision Intelligence Lab, a research division of e-commerce giant Alibaba, created the system as a kind of public test case. The system uses a pretty basic diffusion model to create its videos, according to the company’s page describing its AI model.

ModelScope is open source and already available on Hugging Face, though it may be hard to get the system to run without paying a small fee to run the system on a separate GPU server. Tech YouTuber Matt Wolfe has a good tutorial about how to set that up. Of course, you could go ahead and run the code yourself if you have the technical skill and the VRAM to support it.

ModelScope is pretty blatant in where its data comes from. Many of these generated videos contain the vague outline of the Shutterstock logo, meaning the training data likely included a sizable portion of videos and images taken from the stock photo site. It’s a similar issue with other AI image generators like Stable Diffusion. Getty Images has sued Stability AI, the company that brought the AI art generator into the public light, and noted how many Stable Diffusion images create a corrupted version of the Getty watermark.

Of course, that still hasn’t stopped some users from making small movies using the rather awkward AI, like this pudgy-faced Darth Vader visiting a supermarket or of Spider-Man and a capybara teaming up to save the world.

As far as Runway goes, the group is looking to make a name for itself in the ever-more crowded world of AI research. In their paper describing its Gen-1 system, Runway researchers said their model is trained on both images and video of a “large-scale dataset” with text-image data alongside uncaptioned videos. Those researchers found there was simply a lack of video-text datasets with the same quality as other image datasets featuring images scraped from the internet. This forces the company to derive their data from the videos themselves. It will be interesting to see how Runway’s likely more-polished version of text-to-video stacks up, especially compared to when heavy hitters like Google show off more of its longer-form narrative videos.

If Runway’s new Gen-2 waitlist is like the one for Gen-1, then users can expect to wait a few weeks before they fully get their hands on the system. In the meantime, playing around with ModelScope may be a good first option for those looking for more weird AI interpretations. Of course, this is before we’ll be having the same conversations about AI-generated videos that we now do about AI created images.

The following slides are some of my attempts to compare Runway to ModelScope and also test the limits of what text to image can do. I transformed the images into GIF format using the same parameters on each. The framerate on the GIFs is close to what the original AI-created videos.

Text to Video Generative AI Is Finally Here and It’s Weird as Hell

Sign up for our newsletters

Latest news

‘My Soul Left My Body’: Amazon Accidentally Bills Users Billions of Times What They Owe

‘Magic: The Gathering’ Will Embrace the Multiverse in 2027

Someone Paid Almost $1 Million For Jensen Huang’s Leather Jacket, Should Be Executed by Swirlie

How to Watch France vs England Livestream Free from Anywhere

‘Backrooms’ Almost Got Trapped In Copyright Hell

This July Belongs To the X-Men

Your Child’s Next Teacher Could Be a Sex Robot

Everybody’s Suing Paramount This Week

Latest Reviews

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

X by Xreal a01+ Review: AR Glasses That Are Light on Your Face (and Wallet)

Related Articles

Text to Video Generative AI Is Finally Here and It’s Weird as Hell

Sign up for our newsletters

‘My Soul Left My Body’: Amazon Accidentally Bills Users Billions of Times What They Owe

‘Magic: The Gathering’ Will Embrace the Multiverse in 2027

Someone Paid Almost $1 Million For Jensen Huang’s Leather Jacket, Should Be Executed by Swirlie

How to Watch France vs England Livestream Free from Anywhere

‘Backrooms’ Almost Got Trapped In Copyright Hell

This July Belongs To the X-Men

Your Child’s Next Teacher Could Be a Sex Robot

Everybody’s Suing Paramount This Week

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

X by Xreal a01+ Review: AR Glasses That Are Light on Your Face (and Wallet)

Related Articles

The Best Budget Laptops Under $1,000 for Back to School

The Best Tech to Level Up Summer 2026

China Just Dropped Another Bomb on America’s Frontier AI Companies

‘Spider-Man: Brand New Day’ Is All About the Comics

The ‘Spider-Man: Brand New Day’ Trailers Keep Getting Views

Everyone Wants to Build AI Using Someone Else’s Work