Imagery analysts who work for the military have a tough gig. They literally have to scan through hours of mind-achingly boring video in case something important happens. So wouldn't it be cool if the process could somehow be automated?
This is the thinking of the Pentagon and its advanced research wing, DARPA, who last year announced the Mind's Eye project — an attempt to create a camera with "visual intelligence." And as a new proposal from Carnegie Mellon researchers has shown, there may be a way to do it.
The researchers, Alessandro Oltramari and Christian Lebiere, have spec'd out the design for what they're calling a "high-level artificial visual intelligence system." Once operational, this system will not only be able to function as a conventional camera, it will also be able to recognize human activities and predict what might happen next. Should it encounter a potentially threatening scene or dangerous behavior, it could sound the alarm and notify a human agent.
The smart camera will work by classifying actions it sees in a scene. It will do this by utilizing a linguistic infrastructure that operates in conjunction with a set of "action verbs" — what is a lengthy list of all possible actions. So, by using state-of-the art computer visual algorithms, the camera should be able to discriminate between different actions in a scene and predict their outcomes.
And interestingly, to get the AI to predict these outcomes, Oltramari and Lebiere have essentially approximated human-level visual intelligence. Humans evolved the ability to scan and process their environment for risks — at times relying on experience and guessing correctly what a person might do next (often by looking at the context of a person's actions). By using a semantic system, along with a "cognitive engine," the researchers are trying to get their camera to do the same thing.
The cognitive engine will be powered by two components: partial matching (making probabilistic associations between two actions) and spreading of activation (making probabilistic associations between multiple contexts).
For the next phase of their research, the team hopes to develop the reasoning component of the AI, and to equip it with statistical inferences to derive and predict the goals of the agents that its scanning. They're also looking to add more action verbs and to test the system with a large video dataset.
You can read the entire study at the Semantic Technology for Intelligence, Defense, and Security website.
Image: Winai Tepsuttinun/shutterstock.