In a few short years, neural-network-powered automated face swaps have gone from being mildly convincing to eerily believable. But through new research from Disney, neural face-swapping is poised to become a legitimate and high-quality tool for visual effects studios working on Hollywood blockbusters.
One of the bigger challenges of creating deepfake videos, as they’ve come to be known, is creating a vast database of facial images of a person—thousands of different expressions and poses—that can be swapped into a target video. The larger the database and the higher the quality of the images, the better the face swaps will turn out. But the images (which are more often than not headshots of famous people) are usually pulled from sources with limited resolution. Even a 4K video file can yield low-res face images given how small faces often appear in the overall framing of a shot.
So the first step to generating truly convincing deep fake videos is to start with a high-quality source. In a new paper being presented at the 2020 Eurographics Symposium on Rendering (yet another event being held online this year), “High-Resolution Neural Face Swapping for Visual Effects,” researchers from ETH Zurich and Disney Research Studios detail several new innovations and approaches to automated face-swaps that produce megapixel results with enough quality and resolution to be used for actual film production.
The new algorithm created by the researchers starts by actually modifying the source (or target video) to make it easier for alternate faces to be swapped in. The motion in the source footage is subtly stabilized and smoothed to eliminate potential problems, such as a slightly quivering lip that could potentially throw off the automated swapping process in a later step. The researchers also improved several other steps along the way, including the blending of the new face onto the original through improved compositing techniques to better match the overall contrast. The algorithm even does a much better job at generating the in-between frames needed to create smooth results so that the new face doesn’t appear to jump around when the altered footage is played back.
Every day there seems to be a new use for machine learning that promises to streamline and accelerate a task that has typically taken a long time to complete, and ever since the first deepfake videos started hitting the internet, visual effects artists have seen the potential for the work they do. Face swaps are not uncommon in the film and TV industry; oftentimes a stunt double will momentarily look at the camera, requiring extensive post-production to ensure, even for a brief moment, that the person on the screen looks exactly like who they’re supposed to.
Fixing these problems can often require reshoots, or a combination of clever computer graphics and compositing, which is never cheap. With this new research, existing footage from the same shoot could be used to train the algorithm, which would then fix these problems all on its own. But while overworked visual effects artists and budget-conscious Hollywood producers might celebrate the new tool, it will also make it much harder to spot deepfake videos in the wild. It won’t take long for the new approaches in this research to find their way into existing machine-learning tools, at which point we can expect a new wave of deepfakes to flood the internet—and there’s now a good chance we won’t actually know they’re fake.