These days, combining reinforcement learning with deep learning is emerging as a promising approach for tackling challenging sequential decision-making problems. Such systems however require a huge amount of data for training. DeepMind and McGill University researchers believe the problem could be solved through a strategic “divide-and-conquer” approach.
Reinforcement learning (RL) teaches agents how to behave while interacting with the environment, providing a conceptual framework to address fundamental problems in AI. The combination of RL and deep learning principles has enabled impressive achievements in the creation of efficient algorithms that can be applied in areas like robotics, video games, computer vision, etc.
Unlike humans, RL agents usually learn to perform a task essentially from scratch, which often requires an enormous amount of data input before the agent learns to correctly perform an assigned task. It’s believed RL agents could handle a much wider range of problems if instead of learning everything from scratch they were provided with appropriate mechanisms to leverage prior knowledge.
“We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel,” the DeepMind & McGill researchers explain in their paper Fast Reinforcement Learning With Generalized Policy Updates, published on PNAS (Proceedings of the National Academy of Sciences of the United States of America).
The team proposes associating each task with a reward function to enable this decomposition to be seamlessly accommodated within the standard RL formalism. Through the generalization of two fundamental operations in RL — policy evaluation and policy improvement — the method reduces RL tasks to much simpler problems that can be solved using only a fraction of the data.
The resulting generalized policy evaluation (GPE) and generalized policy improvement (GPI) operations can be used to deliver faster solutions to RL problems. The researchers present several extensions of their basic framework, which can be expanded to other frameworks built upon GPE and GPI. Together, these provide a conceptual toolbox that enables the decomposition of an RL problem into tasks whose solutions basically inform each other.
The approach can reduce RL problems to simpler linear regressions if the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved. Even if this is not the case, agents can still exploit task solutions by using them to interact with and learn about the environment. In both cases, the approach considerably reduces the amount of data needed to solve RL problems.
The researchers believe their “divide-and-conquer” approach to RL, when combined with deep learning, has the potential to scale up agents to a range of problems that are currently out of reach.
The paper Fast Reinforcement Learning With Generalized Policy Updates is on PNAS, and the source code is on GitHub.
Reporter: Yuan Yuan | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.