Artists have trained a machine to reveal its psychedelic inner visions, unleashing human-Pixar hybrids which provoke profound fear, disgust, revulsion, disgust, nausea, fear, and an immediate recognition of humanity’s highest potential. Dubbed “Toonify,” the project allows us to stare into the eyes of God. My only questions were what possessed the creators to make it, how does it work, and when will it come for me? (UPDATE 9/25/2020 2:30pm: It has come for us all.)
The first question was easy enough to answer: for machine learning explorers Justin Pinkney and Doron Adler, crossbreeding Robert De Niro with an unnamed species from the Monsters Inc bestiary was an exciting probe into the unknown. In much the way you or I might learn an instrument or fix an old camera, these guys’ idea of a fun challenge was taking apart and putting back together machine learning models so that they can squish human heads into cartoon proportions. My second question required a bit more technical background.
Toonify runs on Pix2pixHD, an image-to-image conversion model which understands how to convert images to different styles—for example, it can turn a very rough sketch of a shoe into a wholly reimagined photographic rendering of that shoe. It may look like the “draw the rest of the owl” meme, but these drastic results hide the learning the model has done to get there.
But in order to train a Pix2PixHD model, you need to show it a ton of before-and-after images. To achieve diabolically Toonified results in the Pix2PixHD model, Pinkney and Adler needed to come up with a bunch of preexisting pairs of people and their matching toons. How did they find them? They could have asked Pixar, maybe. Instead, they created their own dataset by generating a lineage of toon-man abominations.
First, Doron Adler trained a StyleGAN model—the same tech behind This Person Does Not Exist, a site which randomly spits out photorealistic people that, as the name implies, are entirely computer generated—on Disney, Pixar, and Dreamworks characters so it could recognize features are quintessentially cartoony. The model then automatically selected fake people from the This Person Does Not Exist universe and augmented them with those cartoon features. But StyleGAN globbed all the styles from computer generates images, cartoons, and photographs together equally, which meant that the same person might have tufts of realistic hair, CGI meatball cheeks, and eerily flat hand-drawn eyes.
The model spat out a catalogue of wide-eyed but mushy creations trapped in visual purgatory:
This is where Justin Pinkney came in with his model, which they blended with Adler’s.
Pinkney developed a “layer-swapping” process to parse out the desirable characteristics from each image: the cartoon half affects only the structure of the resulting toonified face, while the human half contributes the lighting and other high-resolution details. (You can see how Pinkney previously did this with Ukiyo-e portraits here, with a nice simple explainer here.)
Splendid! We’ve entered the realm of the damned.
The problem now is that, if you want to Toonify your own face, the layer swap method (which we’ll call Model Two) consumes a ton of time and compute power. This model doesn’t directly convert photographs—it works with encoded faces—which is why Adler chose “people” from the This Person Does Not Exist universe. It needs to “find” a very close approximation of your face to work off of. Here’s how it would find an almost perfect match to my Twitter avatar:
So the model itself still works pretty well, even if it’s not cartooning your actual face, but the search process takes a while.
Now back to the shoe sketches and shoe photos. In order to train the final, photo-to-toon model (Pix2PixHD), they had Model Two randomly select 10,000 fake people and Toonify them. They showed Model Three all of these people/toon pairs, from which it learned that everybody needs to roughly match their photo, but their eyes need to look like pool balls.
So now Model Three understands the look they’re going for, aaand:
There it is, the poison I crave.
As Pinkney explained, they only trained the model on cartoons just a little bit so that the results would hew more toward the original human faces. One Twitter user showed what turning up the tooniness dial would look like. It’s bad!
Armed with a trove of data on human and animated facial features alike, can these researchers’ robot child be used in reverse? With a lot of extra time and effort, yes, but 1) why? And 2) Adler pointed out to Gizmodo that the result would probably look more like a Snapchat filter, with human features but washed-out cartoon skin textures. Generative networks tend to work better with more detail, Pinkney explained. And so we’ll have to leave that most pressing of questions, “what if Bart Simpson was made of meat,” left unanswered.
Still, we continued to bug the hell out of them—because how bad could it look, really—and they said, sure, fine, they’ll Toonify a chihuahua and a potato that kind of looks like a face, even though they knew full well their hard work was in no way applicable to these sorts of inputs.
Try it yourself, and pray to your gods.