Now scientists are exploring yet another final-frontier of human achievement: a backlog of Atari platformers.
When it comes to navigating complex environments, video games are a perfect playground for testing out new-and-improved algorithmic approaches. In a study published Wednesday in the journal Nature, a team of researchers tested a family of algorithms they call Go-Explore on notoriously tricky Atari games, including Montezuma’s Revenge and Pitfall.
The researchers report Go-Explore not only performed with “super-human” ability but also bested existing algorithms that had also attempted to defeat these games.
Why it matters — Far beyond simply smashing dusty Atari cartridges, the authors suggest algorithms like Go-Explore, which are especially good for maneuvering complex environments, could be the future of “generally intelligent agents” that can advance drug development, robotics, and more.
Here’s the background — When it comes to explaining the problems plaguing algorithms that explore twisting and turning worlds — like the algorithms inside your Roomba — a refrigerator is a great example of what can go wrong, the study team explains.
The key friction is between giving an A.I. enough reward to complete a test (like moving toward a fridge) or supplying rewards that may be “deceptive.”
“[T]o guide a robot to a refrigerator, one might provide a reward only when the refrigerator is reached, but doing so makes the reward ‘sparse’ if many actions are required to reach the refrigerator,” write the authors. As a result, an algorithm might not be properly motivated to reach its goal.
Essentially, if an algorithm’s reward system is not perfectly outlined, it may fail its task altogether.
Yet even within algorithms that better account for sparse or dense rewards, two problems remain:
To solve this problem, the team developed a memory trick to help Go-Explore remember where its been and help it best explore a new environment without missing any hidden corners — kind of like embedding save points in its memory.
What they did — Go-Explore’s algorithm works by systematically cataloging every part of the environment it has visited in an easily accessible archive.
The algorithm also makes use of a ‘go and return’ approach, meaning it will always return to a previously explored location (similar to a save point) after exploring somewhere new. Taking this approach, instead of having the A.I. also explore on the way back, means it’s less likely to get lost along the way.
With this algorithm design in place, the A.I. was let loose in a group of 11 Atari games to put its abilities to the test. Ultimately, the A.I. processed 30 billion frames of information while playing these classic games.
In addition to exploring the game environments, the researchers also designed a simulated robot arm tasked with moving cups into locked cabinets to see how this algorithm might fair in the real-world.
What they discovered — When it comes to scoring big on the Atari games, the authors report that their A.I. blew the competition out of the water.
“[T]he mean performance of Go-Explore is both superhuman and surpasses the state of the art in all 11 games,” write the authors. For Montezuma’s Revenge, in particular, the A.I. had a mean score of 1.7 million, far above the human high-score of 1.2 million.
As for the simulated robot and cups, the authors report that the A.I. was able to quickly discover a successful trajectory for putting away the objects. However, while the A.I. did a good job exploring this realistic environment, it still failed to achieve certain tasks within it reliably, such as grasping the cups.
What’s next — Hitting new high scores on decades-old video games isn’t likely to save the world, but the authors write that this algorithm could take on new importance in pretty life-saving ways — including in advanced drug development. It’s possible the algorithm could use similar exploration skills to explore a chemical landscape.
To reach these goals, future iterations of this research will need to improve the generality of the Go-Explore algorithms as well as their efficiency.