Behavioural cloning (BC) is a simple yet powerful machine learning approach for acquiring robotic skills in the real world. BC treats the imitation of expert demonstrations as a supervised learning problem, in which policies can be represented by explicit continuous feed-forward models that map input observations to output actions. BC performance thus relies heavily on the selected explicit models, but it is difficult to ensure the most appropriate models have been chosen.
In a departure from the use of such explicit models, a new Robotics at Google paper proposes reformulating BC using implicit models, and demonstrates that this simple change can lead to remarkable performance improvements across a wide range of contact-rich robot policy learning scenarios.
The Google researchers formulate imitation as a conditional energy-based modelling (EBM) problem. Unlike explicit policies, which directly map inputs to outputs, EBM-based implicit policies take both observations and actions as inputs and optimize for actions that minimize the energy landscape.
Explicit models fit a continuous function to the data, and therefore must factor in every intermediate value between training samples. If the frequency of discontinuities increases, the performance of explicit models is reduced. Implicit models meanwhile can approximate discontinuities without introducing intermediate artifacts; thus, unlike explicit models, their predictions remain sharp at the discontinuities while also respecting local continuities.
To evaluate the effectiveness of their proposed models, the team tested implicit models for learning BC policies across a variety of robotic task domains: D4RL (a recent benchmark for offline reinforcement learning); Particle Integrator (a simple environment with linear dynamics); Simulated Pushing (pushing a block into the target goal zone); Planar Sweeping (pushing a pile of 50-100 randomly positioned particles into a green goal zone); Simulated Bi-Manual Sweeping (scooping up randomly configured particles from a 0.4m2 workspace and transporting them into two bowls); and Real Robot Manipulation (real-world manipulation pushing tasks).
In the experiments, implicit models achieved competitive results or outperformed state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite. On the real world contact-rich tasks, robots using the proposed implicit policies were able to learn complex and remarkably subtle behaviours even on demanding tasks that required 1mm precision.
Although implicit policies are generally more compute-hungry than explicit policies, the researchers show that implicit policy training times for real-time vision-based control in the real world can be modest compared to offline RL algorithms. The paper also presents a novel intuitive analysis of energy-based model characteristics and their potential benefits, and develops a distinct notion of universal approximation for implicit models.
The paper Implicit Behavioral Cloning is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.