Google Unveils New Method to Improve Reinforcement Learning

Reusing data may improve chip design but tough for NLP, computer vision

Ben Wodecki

November 9, 2022

2 Min Read

AI researchers at Google have come up with a novel way to improve reinforcement learning (RL) – reusing prior computational work.

In the paper, Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress, the team proposed reusing logged data or learned models between design iterations of an RL agent to another agent.

The researchers said reusing computational work could “significantly improve real-world RL adoption and help democratize it further.”

Reincarnating RL (RRL) is a “much (more) computationally efficient research workflow than tabula rasa RL and can help further democratize research,” according to the paper’s authors.

The paper was published ahead of the NeurIPS 2022 conference, with codes available via GitHub.

Reducing computational costs for researchers

Reinforcement learning is a machine learning training method where desired behaviors are rewarded while unsought ones are punished. Effectively, it’s a trial-and-error method, with the system gradually learning its tasks and the environment around it. RL can be used to improve deployments across the likes of robotics, autonomous vehicles and dialogue agents.

Most agent-based systems are developed using the tabula rasa method of RL, in that they are built from scratch without using previously learned knowledge about the problem.

Google’s research team argued that the tabula rasa RL method is “typically the exception rather than the norm for solving large-scale RL problems.” They contend that retraining large-scale systems is “prohibitively expensive” especially considering many undergo multiple design changes and modifications.

“The inefficiency of tabula rasa RL research can exclude many researchers from tackling computationally-demanding problems,” a Google blog post by the authors reads.

Instead, the researchers contend that their new reusable method would benefit researchers as excessive computational resources will not be needed.

“RRL can enable a benchmarking paradigm where researchers continually improve and update existing trained agents, especially on problems where improving performance has real-world impact, such as (stratospheric) balloon navigation or chip design,” the Google researchers said.

The paper does state, however, that reincarnating reinforcement learning would be difficult for natural language processing (NLP) and computer vision where pre-trained models are rarely if ever, reproduced or re-trained from scratch but almost always used as-is.

“As reproducibility from scratch involves reproducing existing computational work, it could be more expensive than training tabula rasa, which beats the purpose of doing reincarnation,” the authors wrote.

About the Authors

Ben Wodecki

Assistant Editor

Get the newsletter
From automation advancements to policy announcements, stay ahead of the curve with the bi-weekly AI Business newsletter.