A new learned legged locomotion study uses massive parallelism on a single GPU to get robots up and walking on flat terrain in under four minutes, and on uneven terrain in twenty minutes.
Although deep reinforcement learning (DRL) has achieved impressive results in robotics, the amount of data required to train a policy increases dramatically with task complexity. One way to improve the quality and time-to-deployment of DRL policies is to use massive parallelism.
In the paper Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning, a research team from ETH Zurich and NVIDIA proposes a training framework that enables fast policy generation for real-world robotic tasks using massive parallelism on a single workstation GPU. Compared to previous methods, the approach can reduce training time by multiple orders of magnitude.
Current on-policy reinforcement learning approaches comprise both data collection and policy updates. The policy updates correspond to back-propagation for neural networks, which is relatively easy to perform in parallel on a single GPU. Parallelizing data collection however involves policy inference, simulation, rewards, and observation calculations; and remains a challenging research area for several reasons. Firstly, GPUs are unsuitable for policy inference due to communication bottlenecks; secondly, data transfer over a Peripheral Component Interconnect Express (PCIe) can be 50 times slower than the GPU processing time alone; thirdly, sending large amounts of data to the GPU for each policy update tends to slow down the entire process.
To overcome these issues, the researchers explore the potential of massive parallelism with end-to-end data collection and policy updates on the GPU, aiming to improve the efficiency of DRL training.
The proposed DRL algorithm is built upon the Proximal Policy Optimization (PPO) algorithm and is designed to perform every operation and store all data on a GPU to enable efficient learning from thousands of robots in parallel.
The study found that it is important to train the ambulatory policy on less challenging terrain before progressively increasing complexity, and so the researchers have introduced a game-inspired automatic curriculum learning scheme that trains robots at a level of difficulty tailored to their performance without requiring any external tuning.
This curriculum learning is also well suited for the massively parallel regime, as it enables thousands of robots to directly use their current progress in the curriculum as the distribution of the policy’s performance. It also does not require tuning, and thus be implemented in a parallel manner with near-zero processing cost.
The team conducted a number of evaluation experiments, training robots with proposed DRL algorithms in the Isaac Gym physics simulation environment on a single workstation GPU.
The experiments yielded two interesting observations: 1) When the number of robots is too high, the performance drops sharply; 2) With larger batch sizes, the overall reward is higher and the time horizon effect is shifted.
The researchers also conducted simulation and deployment experiments, where they observed a nearly 100 percent success rate for climbing and descending steps up to 0.2 m (the hardest stair difficulty).
Overall, the study shows that using an end-to-end GPU pipeline with thousands of robots simulated in parallel, combined with the proposed curriculum structure, can reduce the training time required to teach robots to walk by multiple orders of magnitude compared to current baselines. The team says they hope their study can change researchers’ perspectives on the required training time for real-world robotics applications, and that other tasks may also benefit from the massively parallel regime.
The paper Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.