DeepMind’s new AI taps games to enhance fundamental algorithms

DeepMind has applied its mastery of games to a more serious business: the foundations of computer science.

The Google subsidiary today unveiled AlphaDev, an AI system that discovers new fundamental algorithms. According to DeepMind, the algorithms it’s unearthed surpass those honed by human experts over decades. 

The London-based lab has grand ambitions for the project. As demand for computation grows and silicon chips approach their limits, fundamental algorithms will have to become exponentially more efficient. By enhancing these processes, DeepMind aims to transform the infrastructure of the digital world. 

The first target in this mission is sorting algorithms, which are used to order data. Under the covers of our devices, they determine everything from search rankings to movie recommendations.

To enhance their performance, AlphaDev explored assembly instructions, which are used to create binary code for computers. After an exhaustive search, the system uncovered a sorting algorithm that outperformed the previous benchmarks.

To find the winning combination, DeepMind had to revisit the feats that made it famous: winning board games.

Gaming the system 

DeepMind made its name in games. In 2016, the company grabbed headlines when its AI program defeated a world champion of Go, a wickedly complicated Chinese board game. 

Following the victory, DeepMind built a more general-purpose system, AlphaZero. Using a process of trial and error called reinforcement learning, the program mastered not only Go, but also chess and shogi (aka “Japanese chess”).

AlphaDev — the new algorithm builder — is based on AlphaZero. But the influence of gaming extends beyond the underlying model.

“We penalise it for making mistakes.

DeepMind formulated AlphaDev’s task as a single-player game. To win the game, the system had to build a new and improved sorting algorithm. 

The system played its moves by selecting assembly instructions to add to the algorithm. To find the optimal instructions, the system had to probe a vast quantity of instruction combinations. According to DeepMind, the number was similar to the number of particles in the universe. And just one bad choice could invalidate the entire algorithm.

After each move, AlphaDev compared the algorithm’s output with the expected results. If the output was correct and the performance was efficient, the system got a “reward” — a signal that it was playing well.

“We penalise it for making mistakes, and we reward it for finding more and more of these sequences that are sorted correctly,” Daniel Mankowitz, the lead researcher, told TNW.

As you’ve probably guessed, AlphaDev won the game. But the system didn’t only find a correct and faster program. It also discovered novel approaches to the task.