Using a mix of unlabeled Minecraft videos and a small dataset of ones labeled by contractors, artificial intelligence company OpenAI was able to train a neural network to competently play Minecraft—a milestone for the technology, which had previously struggled to crack the game’s simple but loose gameplay. Open AI engineers revealed their experiment in a research paper and subsequent blog post this week.
OpenAI’s model was able to move beyond basic crafting and survival and actually perform many of the same complex tasks a human Minecraft player would. In its blog post, OpenAI shows a video of its model swimming, hunting, and cooking animals. It even successfully figured out the game’s “pillar jumping” technique. More recently, Deepmind was able to successfully train its MuZero AI to play Atari games.
Previous AI models have famously relied on various forms of reinforcement learning in the past to beat classic games like Chess and Go. Minecraft on the other hand, though intuitive enough for young children to master, presents a challenge to AI systems due to its open world and open-ended structure.
While there’s a seemingly endless supply of videos floating around the internet of Minecraft gameplay, those only tell part of the story of how to actually learn how to play the game, at least when training an AI. According to OpenAI, the flurry of unlabeled video data excels at demonstrating “what” to do, but it doesn’t provide exact key presses or mouse moments that are necessary for an AI to understand “how” to play.
The engineers solved this “how” problem by creating a semi-supervised imitation learning method they call “Video PreTraining,” or VPT. OpenAI essentially gathered a new, smaller dataset from contractors which included not just Minecraft gameplay but also examples of key presses and other actions recorded by the contractors. OpenAI then created another model which uses the contractors’ videos to predict what action will come next in each step of a Minecraft video. Equipped with the basic knowhow, their AI was then able to successfully understand larger datasets of Minecraft videos online. Rather than just dump a torrent of data on their AI, the engineers took the time to teach it the fundamentals of basic inputs first.
“For many tasks our models exhibit human level performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish,” OpenAI worte in their research paper detailing the findings.
All that training and contractor assistance reportedly resulted in a price tag of about $160,000. Most of that cash, according to ZDNet, went to paying out the contractors who collectively assembled around 4,500 hours of gameplay. The contractors were paid $20 per hour.
You can see some footage of the AI chopping wood, managing its inventory, and scouring caves for yourself below.
If watching an AI essentially worth the yearly salary of some surgeons play an 11-year-old indie game doesn’t seem all that impressive, it’s worth taking a step back and seeing how far the tech has come. Just three years ago, teams of technologists competing in the MineRL competition were tasked with a single, seemingly simple goal: create an AI that can successfully mine a diamond in Minecraft. 660 contestants reportedly tried to complete this challenge, and every last one of them failed. OpenAI’s model can now craft diamond tools.
OpenAI also isn’t the only tech company turning to Minecraft for its AI experiments. Last month, during its Build conference, Microsoft revealed a new AI Minecraft “agent” that operates within the game. Users interacting with Microsoft Minecraft agents can type in commands that are then auto-generated using the game’s software API. In practice, Wired notes, that means users can type in a phrase like “come here” and the Minecraft bot will automatically translate that into Minecraft code, resulting in the bot actually moving forward. Aside from just walking, Microsoft’s Minecraft agent can also complete more complex tasks like retrieving items out in the game world and combining them to create something. And look, it can probably do that better and faster than this writer, who’s several years removed from his last Minecraft session.