Behaviour cloning is a flexible class of approaches employed in control or reinforcement learning (RL) settings for transferring behaviours from expert demonstrations — such as those obtained via a human teleoperating the relevant system directly — to a learned student policy. Behavioural cloning however becomes less efficient in settings where an expert policy is also available and can be queried.
In the new paper Data Augmentation for Efficient Learning from Parametric Experts, a DeepMind research team proposes Augmented Policy Cloning (APC), a simple yet effective data-augmentation approach designed to support data-efficient learning from parametric experts for high-degrees-of-freedom control problems. The proposed approach significantly improves data efficiency across various control and reinforcement learning (RL) settings.

The proposed approach treats data augmentation as a standard RL problem and leverages additional information in the neighbourhood of the sampled trajectories from an expert rollout for finding policies that will maximize the system’s expected discounted future reward. Basically, instead of only sampling from the expert, it also transfers from the expert policy to a student policy — a method the researchers refer to as “policy cloning.”

To minimize the amount of data a student needs to collect, the team proposes expert-aware data augmentation. Their APC approach applies random perturbations to the state-derived observations and trains the student to match the expert queried optimal action at each perturbation stage so it can gain sufficient knowledge from an expert without requiring an excessive number of rollouts.


In their empirical study, the team compared the proposed APC approach to baselines such as the naive approach. In the tests, APC significantly improved data efficiency in settings such as behavioural cloning, expert compression, cloning privileged experts, DAgger (Ross et al., 2011), and kickstarting.
Overall, this work introduces a promising approach for the efficient transfer of expert behaviours by augmenting expert trajectory data. The team says an interesting future research direction in this area could involve generating and sampling virtual states via a state model.
The paper Data Augmentation for Efficient Learning from Parametric Experts is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Great blog! This is really useful
Pingback: Just How To Recognize Liquor Addiction And Dependence – Samara Camps
Great share, great blog.
The study showcased the APC approach’s superiority in data efficiency over baselines like the naive method, enhancing techniques such as behavioural cloning and DAgger. This innovation offers potential in transferring expert behaviors through improved trajectory data. Future endeavors might include virtual state generation. Speaking of fun challenges, try the Monkey Mart game to experience a different kind of strategic improvement!
The primary focus of this method is to make use of expert knowledge in a more efficient way, especially when it comes to problems involving high degrees of freedom, such as control I Didn’t Cheat tasks.
Using freecine, you can easily explore multiple genres like action, comedy, thriller, and drama in just a few clicks.