The rate at which deepfake videos are advancing is both impressive and deeply unsettling. But researchers have described a new method for detecting a “telltale sign” of these manipulated videos, which map one person’s face onto the body of another. It’s a flaw even the average person would notice: a lack of blinking.
Researchers from the University at Albany, SUNY’s computer science department recently published a paper titled “In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking.” The paper details how they combined two neural networks to more effectively expose synthesized face videos, which often overlook “spontaneous and involuntary physiological activities such as breathing, pulse and eye movement.”
The researchers note that the mean resting blink rate for humans is 17 blinks per minute, which increases to 26 blinks per minute when someone is talking, and decreases to 4.5 blinks per minute when someone is reading. The researchers add that these distinctions are worth paying attention to “since many of the talking-head politicians are probably reading when they are being filmed.” So when a subject in a video doesn’t blink at all, it’s an easy tell that the footage isn’t legit.
There’s a reason subjects in deepfake videos don’t blink: Most training datasets fed to neural networks don’t include closed-eye photos, as photos of people posted online generally depict their eyes open. That’s consequential, given someone needs to collect plenty of photos of an individual in order to create a deepfake of them, and this can be done through an open-source photo-scraping tool which grabs publicly available photos of the target online.
Previous papers have pointed to the lack of eye-blinking as a way to detect deepfakes, but the University at Albany researchers say their system is more accurate than previously suggested detection methods. Earlier studies used eye aspect ratio (EAR) or a convolutional neural network-based (CNN) classifiers to detect if eyes were open or closed. In this case, the researchers combined the CNN-based method with a recursive neural network (RNN), an approach that considers previous eye states in addition to individual frames of video.
Unlike a purely CNN model, the researchers say their Long-term Recurrent Convolutional Network (LRCN) approach can “effectively predict eye state, such that it is more smooth and accurate.” According to the paper, this approach has an accuracy of 0.99, compared to CNN’s 0.98 and EAR’s 0.79.
At the very least, the researchers’ findings signal that the machine learning advances that enabled the creation of these ultrarealistic fake videos could have a hand in exposing them. But deepfakes are still improving alarmingly quickly. For instance, a new system called Deep Video Portraits lets a source actor manipulate the portrait video of someone else, and it allows for a number of physiological signals, including blinking and eye gaze.
It’s comforting to see experts looking for ways to spot real videos from fake ones, especially since bad actors will continue abuse the technology to exploit women and potentially advance the spread of fake news. But it remains to be seen whether these detection methods will outpace the rapid advancement of deepfake tech. And, more concerningly, if the general public will even take the time to wonder whether the video they are watching is real or the product of an internet troll.
“In my personal opinion, most important is that the general public has to be aware of the capabilities of modern technology for video generation and editing,” Michael Zollhöfer, a visiting assistant professor at Stanford University who helped develop Deep Video Portraits, wrote in a blog post. “This will enable them to think more critically about the video content they consume every day, especially if there is no proof of origin.”
[h/t The Register]