Behaviour cloning is a flexible class of approaches employed in control or reinforcement learning (RL) settings for transferring behaviours from expert demonstrations — such as those obtained via a human teleoperating the relevant system directly — to a learned student policy. Behavioural cloning however becomes less efficient in settings where an expert policy is also available and can be queried.
In the new paper Data Augmentation for Efficient Learning from Parametric Experts, a DeepMind research team proposes Augmented Policy Cloning (APC), a simple yet effective data-augmentation approach designed to support data-efficient learning from parametric experts for high-degrees-of-freedom control problems. The proposed approach significantly improves data efficiency across various control and reinforcement learning (RL) settings.
The proposed approach treats data augmentation as a standard RL problem and leverages additional information in the neighbourhood of the sampled trajectories from an expert rollout for finding policies that will maximize the system’s expected discounted future reward. Basically, instead of only sampling from the expert, it also transfers from the expert policy to a student policy — a method the researchers refer to as “policy cloning.”
To minimize the amount of data a student needs to collect, the team proposes expert-aware data augmentation. Their APC approach applies random perturbations to the state-derived observations and trains the student to match the expert queried optimal action at each perturbation stage so it can gain sufficient knowledge from an expert without requiring an excessive number of rollouts.
In their empirical study, the team compared the proposed APC approach to baselines such as the naive approach. In the tests, APC significantly improved data efficiency in settings such as behavioural cloning, expert compression, cloning privileged experts, DAgger (Ross et al., 2011), and kickstarting.
Overall, this work introduces a promising approach for the efficient transfer of expert behaviours by augmenting expert trajectory data. The team says an interesting future research direction in this area could involve generating and sampling virtual states via a state model.
The paper Data Augmentation for Efficient Learning from Parametric Experts is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.