After putting nearly all of its eggs in one virtual basket, Meta is now looking for ways to make the metaverse a place that people actually want to visit, starting with giving users a surprisingly easy way to digitize themselves as a virtual reality avatar using just the hardware in their smartphone.
Digital stunt doubles are more popular than ever in Hollywood, but as more than just a safer way to make the action on screen more exciting. Recreating many of Marvel’s comic book costumes in the movies can often only be achieved through digital doubles (assuming you don’t want to pay unions). Creating such doubles, right now, requires large teams of talented visual effects artists and specialized equipment, all of which come together to digitize an actor’s performance. That includes giant sound stages covered in tracking cameras, with actors having to wear special suits and makeup so their facial features can be accurately captured and reproduced. It’s expensive and complicated, which is why most online avatars look like cartoonish caricatures that barely resemble the person they’re representing. That, or you can commission an expensive vtuber rig, but even then you won’t look like a real human being.
If users are really going to start spending more time in the metaverse, it has to be more engaging, and one way to make virtual experiences with friends more enjoyable is for them to actually look like your friends. But no one wants to spend hours trying to recreate themselves in an elaborate avatar customizer, nor does Meta want to invite its billions of users to a VFX studio to get digitized. The better approach is to leverage the technology everyone already has access to, and for most Facebook users, that’s a smartphone.
In a paper that will be presented at the Siggraph 2022 conference in Vancouver, British Columbia in August, a team of researchers from Meta’s Reality Labs detail a new approach to digitizing a human’s appearance and then generating a fully 3D model capable of expressing a wide range of emotions: something the company has been working on for years. In 2019, Facebook researchers used a giant rig called Mugsy, featuring 171 high-res cameras inside a giant sphere, to capture the necessary imagery to generate these 3D avatars. It recorded 180 GB of data every second, and required the person being digitized to sit in the center of the camera sphere for about an hour while reading scripts and making weird faces. It produced great results, but was simply not a practical way to digitize the masses.
Three years later, the Mugsy rig can be replaced with a recent generation smartphone’s front-facing selfie camera. Instead of spending an hour in a chair surrounded by hundreds of cameras, users simply have to pan their smartphone across their face, from side-to-side, and then recreate a series of 65 specific facial expressions. The researchers say the process now takes about three-and-a-half minutes, and using a neural network that was previously trained on the 3D facial data captured from 255 diverse subjects inside a camera rig similar to Mugsy, the new approach can generate surprisingly lifelike 3D avatar models.
The process still isn’t instant. Once the face scans and various expressions have been performed and captured, a computer with serious number-crunching capabilities still needs about six hours to render the results. But this is once again where the cloud shows its usefulness, as individual users won’t require a high-end PC at home—all the rendering can be done somewhere else. The process also won’t work if someone is wearing glasses, and it does a bad job at recreating long hair. It also only works on heads, so while the 3D avatars created might look realistic enough to finally escape the uncanny valley, only interacting with your friends and family’s disembodied heads is still going to bring with it some level of creepyness.