Deep Reinforcement Learning (RL) has achieved superhuman performance in fully observable environments such as Atari Games and Go. In real-world scenarios, however, the likelihood of an agent obtaining a reward by random exploration is low; thus, rewards are more likely to be sparse than dense. This low sample efficiency has limited the training and performance of RL agents for practical applications in real-world environments.
Researchers have previously proposed curriculum learning to improve sample efficiency — with esteemed MILA founder Yoshua Bengio demonstrating in 2009 that curriculum learning can boost network performance across a number of tasks. Now, Bengio and researchers from Université de Montréal and the École Normale Supérieure in Paris have made further progress, introducing a new curriculum learning algorithm based on the notion of mastering rate. The approach improves on a method the team first proposed in a 2019 ICLR paper, and significantly outperforms previous progress-based algorithms on various reinforcement learning and supervised learning tasks.
Curriculum learning begins with only easy examples of a task, where the agent can get rewards more quickly, then gradually increases task difficulty using the previously learned policy for training. It can be broken down into two parts:
- Defining the curriculum, i.e. the set of tasks the learner may be trained on.
- Defining the program, i.e. given its learning state and the curriculum, deciding which tasks to train the learner on at each training step.
The main difference between Bengio’s and previous work in this area is program design. Traditional approaches relied on hand-designed programs, which allowed the learner to advance to the next task when it reached predefined performance thresholds, or which increased training examples for harder tasks in the case of supervised learning. These approaches however required many model iterations and could also result in catastrophic forgetting for learners.
The new work focuses on program algorithms. Given a curriculum and the learning state of the learner, algorithms decide which tasks to train the learner on next. The researchers refer to these as mastering rate based algorithms (MR algorithm).
There are two main contributions in the new approach:
- A new teacher-student algorithm that is not only simpler to use but also improves performance in terms of stability and sample complexity.
- An assumption that the good next tasks are learnable but not learned yet. This addresses shortcomings of previous teacher-student algorithms, where the learner mainly trained on tasks it had not yet learned or could not yet learn.
The researchers used three curricula to evaluate the algorithms:
- The BlockedUnlockPickup curriculum: a sequence of three tasks with increasing difficulty.
- The KeyCorridor curriculum: a sequence of six tasks with increasing difficulty.
- The ObstructedMaze curriculum, comprising six tasks that require different abilities.
The above two figures show the median return during training, plotting a confidence interval representing the first and last quartiles, where the x-axis represents the number of frames. From the experiment results, we can see that the gProp Linreg algorithm performs better than the gAmax Linreg and gAmax Window algorithm. This shows how the proposed curriculum learning algorithm based on the notion of mastering rate effectively addresses the shortcomings of learning-progress-based program algorithms, resulting in more sample efficiency and robust learning.
The paper Mastering Rate based Curriculum Learning is on arXiv.
Analyst: Hecate He | Editor: Michael Sarazen
Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.
Very good points you wrote here..Great stuff…I think you’ve made some truly interesting points.Keep up the good work.
cyber security course training in Guwahati