When it first appeared in 1984, Montezuma’s Revenge was considered one of the most challenging video games to appear on a gaming console. Now, in an effort to help machines learn more efficiently, AI researchers have created an algorithm that actually motivates the hero of this classic video game in some very important ways—and it’s surprisingly effective.
Anyone who has ever played Montezuma’s Revenge knows how frustrating this game can actually be. The authors of the new study describe the challenges this way:
Montezuma’s Revenge is infamous for its hostile, unforgiving environment: the agent [called Panama Joe] must navigate a maze composed of different rooms, each filled with a number of traps. The rewards are far and few in between, making it almost impossible for undirected exploration schemes to succeed.
This vintage game is difficult for human players, let alone an artificial intelligence. To date, it takes hundreds of millions of individual frames for an AI to attain even the slightest performance levels, and at best they’re only capable of clearing two or three rooms out of the 72.
Google’s DeepMind division has been trying to solve Montezuma’s Revenge for quite some time now. Last year, Google announced that its Deep Q system was capable of defeating 49 Atari games simply by watching how video games are played. But Montezuma’s Revenge presents a different challenge entirely. As reported in Wired at the time, Deep Q was incapable of any kind of progress in the game, scoring “a big fat zero.” The issue, as pointed out by Dave Gershgorn in Popular Science, is that in order to succeed at this game, “players need to plan how to clear a room, and then execute that plan.”
To that end, and in its latest effort to finally create an agent that can at least partially succeed at Montezuma’s Revenge, the DeepMind researchers endowed Panama Joe with what’s called “intrinsic motivation.”
Basically, the protagonist of our digital adventure is trained to solve each level in a similar way to how humans do it, and it’s done using novelty-based rewards. Panama Joe is “motivated” not only to win the game but also to explore more of the game. In each episode, he tries out something different, and this often breeds new solutions and ultimately success.
Of course, Joe isn’t really self-aware like human players. Rather, he’s incentivized through a series of digital rewards. This helps him learn faster, and from just a few examples.
In one example (show in the above video), Panama Joe actually manages to solve an entire level in just four tries. And in a comparative analysis of agents programmed with and without this so-called artificial curiosity, the intrinsically motivated Joe explored 15 rooms out of 24, while the unmotivated Joe explored just two.
By working this way in so-called Arcade Learning Environments, the researchers are hoping to produce algorithms that can be applied to the real world. In future, similar motivations could help robots and other autonomous devices navigate and explore their worlds in a similar manner.