Fancy algorithms capable of solving a Rubik’s Cube have appeared before, but a new system from the University of California, Irvine uses artificial intelligence to solve the 3D puzzle from scratch and without any prior help from humans—and it does so with impressive speed and efficiency.

New research published this week in Nature Machine Intelligence describes DeepCubeA, a system capable of solving any jumbled Rubik’s Cube it’s presented with. More impressively, it can find the most efficient path to success—that is, the solution requiring the fewest number of moves—around 60 percent of the time. On average, DeepCubeA needed just 28 moves to solve the puzzle, requiring 1.2 seconds to calculate the solution.

Sounds fast, but other systems have solved the 3D puzzle in less time, including a robot that can solve the Rubik’s cube in just 0.38 seconds. But these systems were specifically designed for the task, using human-scripted algorithms to solve the puzzle in the most efficient manner possible. DeepCubeA, on the other hand, taught itself to solve Rubik’s Cube using an approach to artificial intelligence known as reinforcement learning.

“Artificial intelligence can defeat the world’s best human chess and Go players, but some of the more difficult puzzles, such as the Rubik’s Cube, had not been solved by computers, so we thought they were open for AI approaches,” said Pierre Baldi, the senior author of the new paper, in a press release. “The solution to the Rubik’s Cube involves more symbolic, mathematical and abstract thinking, so a deep learning machine that can crack such a puzzle is getting closer to becoming a system that can think, reason, plan and make decisions.”

Indeed, an expert system designed for one task and one task only—like solving a Rubik’s Cube—will forever be limited to that domain, but a system like DeepCubeA, with its highly adaptable neural net, could be leveraged for other tasks, such as solving complex scientific, mathematical, and engineering problems. What’s more, this system “is a small step toward creating agents that are able to learn how to think and plan for themselves in new environments,” Stephen McAleer, a co-author of the new paper, told Gizmodo.

Reinforcement learning works the way it sounds. Systems are motivated to achieve a designated goal, during which time they gain points for deploying successful actions or strategies, and lose points for straying off course. This allows the algorithms to improve over time, and without human intervention.

Reinforcement learning makes sense for a Rubik’s Cube, owing to the hideous number of possible combinations on the 3x3x3 puzzle, which amount to around 43 quintillion. Simply choosing random moves with the hopes of solving the cube is simply not going to work, neither for humans nor the world’s most powerful supercomputers.

DeepCubeA is not the first kick at the can for these University of California, Irvine researchers. Their earlier system, called DeepCube, used a conventional tree-search strategy and a reinforcement learning scheme similar to the one employed by DeepMind’s AlphaZero. But while this approach works well for one-on-one board games like chess and Go, it proved clumsy for Rubik’s Cube. In tests, the DeepCube system required too much time to make its calculations, and its solutions were often far from ideal.

The UCI team used a different approach with DeepCubeA. Starting with a solved cube, the system made random moves to scramble the puzzle. Basically, it learned to be proficient at Rubik’s Cube by playing it in reverse. At first the moves were few, but the jumbled state got more and more complicated as training progressed. In all, DeepCubeA played 10 billion different combinations in two days as it worked to solve the cube in less than 30 moves.

“DeepCubeA attempts to solve the cube using the least number of moves,” explained McAleer. “Consequently, the moves tend to look much different from how a human would solve the cube.”

After training, the system was tasked with solving 1,000 randomly scrambled Rubik’s Cubes. In tests, DeepCubeA found a solution to 100 percent of all cubes, and it found a shortest path to the goal state 60.3 percent of the time. The system required 28 moves on average to solve the cube, which it did in about 1.2 seconds. By comparison, the fastest human puzzle solvers require around 50 moves.

“Since we found that DeepCubeA is solving the cube in the fewest moves 60 percent of the time, it’s pretty clear that the strategy it is using is close to the optimal strategy, colloquially referred to as God’s algorithm,” study co-author Forest Agostinelli told Gizmodo. “While human strategies are easily explainable with step-by-step instructions, defining an optimal strategy often requires sophisticated knowledge of group theory and combinatorics. Though mathematically defining this strategy is not in the scope of this paper, we can see that the strategy DeepCubeA is employing is one that is not readily obvious to humans.”

To showcase the flexibility of the system, DeepCubeA was also taught to solve other puzzles, including sliding-tile puzzle games, Lights Out, and Sokoban, which it did with similar proficiency.

“We applied our algorithm to a total of seven puzzles and found that it was able to solve them all. Therefore, this is evidence of a more generally applicable method,” said Agostinelli. “We believe that, given only a goal state and a method to work backwards from that goal state, not only can AI algorithms learn to find a path to the goal, it can learn to do so in the most efficient manner possible.”

From here, the UCI researchers would like to modify the DeepCubeA algorithm to perform other tasks, such as protein structure prediction, which could be useful for developing new drugs. They’d also like to use the system’s path-finding skills to help robots navigate more efficiently in complex environments.

## DISCUSSION