Using reinforcement learning (RL) to train robots directly in real-world environments has been considered impractical due to the huge amount of trial and error operations typically required before the agent finally gets it right. The use of deep RL in simulated environments has thus become the go-to alternative, but this approach is far from ideal, as it requires designing simulated tasks and collecting expert demonstrations. Moreover, simulations can fail to capture the complexities of real-world environments, are prone to inaccuracies, and the resulting robot behaviours will not adapt to real-world environmental changes.
The Dreamer algorithm proposed by Hafner et al. at ICLR 2020 introduced an RL agent capable of solving long-horizon tasks purely via latent imagination. Although Dreamer has demonstrated its potential for learning from small amounts of interaction in the compact state space of a learned world model, learning accurate real-world models remains challenging, and it was unknown whether Dreamer could enable faster learning on physical robots.
In the new paper DayDreamer: World Models for Physical Robot Learning, Hafner and a research team from the University of California, Berkeley leverage recent advances in the Dreamer world model to enable online RL for robot training without simulators or demonstrations. The novel approach achieves promising results and establishes a strong baseline for efficient real-world robot training.
The team summarizes their main contributions as:
- Dreamer on Robots: We apply Dreamer to 4 robots, demonstrating successful learning directly in the real world, without introducing new algorithms.
- Walking in 1 Hour: We teach a quadruped from scratch in the real world to roll off its back, stand up, and walk in only 1 hour. Afterward, we find that the robot adapts to being pushed within 10 minutes, learning to withstand pushes or quickly roll over and get back on its feet.
- Visual Pick and Place: We train robotic arms to pick and place objects from sparse rewards, which requires localizing objects from pixels and fusing images with proprioceptive inputs. The learned behaviour outperforms model-free agents and approaches human performance.
- Open Source: We publicly release the software infrastructure for all our experiments, which supports different action spaces and sensory modalities, offering a flexible platform for future research of world models for robot learning in the real world.
Dreamer learns its world model from a replay buffer of past experiences. It adopts an actor-critic algorithm to learn behaviours from the learned model’s predicted trajectories, then deploys these behaviours in the environment to continuously grow the replay buffer. In the new paper’s implementation for online RL in the real world, the world model and actor-critic behaviour are continuously trained by a learner thread, while a parallel actor thread computes actions for environment interaction.
The team evaluated Dreamer on a variety of challenging tasks involving locomotion, manipulation, navigation, etc. The results show that Dreamer can train physical robots to perform behaviours such as rolling off their backs, standing up, and walking, all from scratch and in only about one hour. Dreamer also approached human performance on a task involving picking and placing multiple objects directly from camera images; and, on a wheeled robot, learned to navigate to a goal position purely from camera images, automatically resolving ambiguities with regard to robot orientation.
Overall, this work demonstrates Dreamer’s strong potential for sample-efficient physical robot learning of real-world tasks without simulators.
Videos are available on the project website: https://danijar.com/daydreamer. The paper DayDreamer: World Models for Physical Robot Learning is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Pingback: Cal Pupil Store - My Blog
Pingback: Cal Pupil Store - university