You know those times you’re at a party you don’t really want to be at and staring at a blank wall is more appealing than socializing? To a high-res camera, that seemingly boring blank wall is actually a wealth of information that can be used to figure out how many people are in a room and what they’re doing.
Just a month after researchers at the Stanford Computational Imaging Lab revealed a technique where a single laser beam fired into a room through a small hole in the wall could be used to generate images of objects in the room, a team from the Massachusetts Institute of Technology have published research on a different approach to figuring out what’s going on in a room when you can’t actually see inside it—or at least see who’s in there and what they’re doing.
It’s not always apparent to the naked eye, but in a room with even a single light source, every wall is bathed in shadows: some moving, some static, some soft, and some with hard edges. Depending on where the light source is located, visualizing the shadows and pulling any meaningful info from them can be almost impossible, but the researchers from MIT used a high resolution video camera with excellent low-light performance (the amount of sensor noise has to be as minimal as possible) to capture enough footage of a blank well that special processing techniques were able to not only see the shadow’s movements, but extrapolate who was creating them.
The technique starts with pointing a video camera at a wall in a room where people moving around are not seen by the camera itself—just the wall. The captured footage is then averaged so that moving shadows cast by humans are eliminated while shadows cast by stationary objects, like furniture, are preserved, creating essentially a baseline reference of what the illuminated wall in the room looks like when no one’s in it.
The researchers then did essentially the same thing in a different room with a different wall, but with specific numbers of people doing very specific actions and movements. This footage was used to train a neural network so that it was able to recognize what shadow movements were produced by what actions, and how many people were doing them. Once trained, the neural network was able to make these deductions on footage of any blank wall (it didn’t matter if it was completely clean or had details like door frames, light switches, etc.) with an impressive 94.4% accuracy when counting the people in the room, and 97.3% accuracy when recognizing their specific activity.
Is it time to keep your blinds and curtains closed 24/7? Not quite yet. As with most approaches that rely on neural networks, the system has to be trained to on what actions it’s supposed to recognize, and while there’s not an infinite number of things one can do in a room, trying to train an AI on all the possibilities is simply impossible. However, there is the possibility to train it on illicit activities specifically, but it’s hard to say if the technique is smart enough to recognize someone handing over a suitcase full of money as a bribe.
The system can also be rendered completely useless in the presence of flickering light, like from a candle, or a television in the room. The constantly changing light intensities makes it impossible to discern shadows created through movement versus shadows created from the flickering. The performance is also poor in dimly lit rooms, which is probably a limitation that comes from the current capabilities of video cameras, so if you don’t want anyone snooping on you, either keep the lights off, or swap all your lamps for candelabras.