The conventional approach for improving the decision-making of deep reinforcement learning (RL) agents is to gradually amortize the useful information they gain from their experiences via gradient descent on training losses. This method however requires building increasingly large models to deal with increasingly complex environments and is difficult to adapt to novel situations. Although adding information sources can benefit agent performance, there is currently no end-to-end solution for enabling agents to attend to information outside their working memory to inform their actions.
In the new paper Large-Scale Retrieval for Reinforcement Learning, a DeepMind research team introduces a novel approach that dramatically expands the information accessible to reinforcement learning (RL) agents, enabling them to attend to tens of millions of information pieces, incorporate new information without retraining, and learn in an end-to-end manner how to use this information in their decision making.
In the work, the team trains a semiparametric model-based agent to predict future policies and values conditioned on future actions in a given state and adds a retrieval mechanism to enable the model to draw from information in a large-scale dataset to inform its predictions.
The team faced two main challenges in improving agent predictions via auxiliary information: 1) Finding a scalable way to select and retrieve relevant data, and 2) Identifying the most robust method for leveraging that data in the model.
To effectively scale the information selection and retrieval process, the team draws from attention mechanisms, adopting an inner product in an appropriate key-query space and selecting the top-N reverent results. To learn key & query embeddings end-to-end to optimize final model predictions, a language-modelling inspired embedding function is learned via a surrogate procedure. The resulting frozen function is then used to represent domain-relevant similarity.
The researchers note that scaling also constrains the nearest-neighbours lookup to approximate, as getting the true nearest-neighbours is prohibitively time-consuming at inference time, but good approximations are possible. To effectively leverage data in the model, the team provides the nearest-neighbour associated data as additional input features that the model learns to interpret. This setup enables the model to adapt more easily to large and complex environments.
In their empirical studies, the team evaluated their proposed retrieval method in the combinatorial state space of the traditional game of Go on a 9×9 board (≈10^38 possible games). The results show that the proposed model can effectively retrieve relevant data from a set of tens of millions of expert demonstration states and achieve a significant boost in terms of prediction accuracy, demonstrating the promise and potential of large-scale retrieval techniques for RL agents.
The paper Large-Scale Retrieval for Reinforcement Learning is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.