Berkeley Artificial Intelligence Research (BAIR) has introduced a new reinforcement learning (RL) method, Stochastic Optimal Control with Latent Representations (SOLAR), which can help robots quickly learn tasks such as stacking blocks or pushing objects from visual inputs.
Academia in general is moving from model-free RL methods to model-based approaches for the purpose of data efficiency. Model-based RL methods however still require many interactions in image-based learning to accurately predict future images. This has led researchers to create alternative methods which do not need accurate future prediction, such as LQR-FLM (LQR with fitted linear models), one of most efficient RL methods at learning control skills by modeling state dynamics as approximately linear. LQR-FLM however fails to efficiently perform image-based tasks because the dynamics of pixels from a camera feed are not linear.
How to perform image-based tasks while still leveraging the strengths of LQR-FLM? Researchers found the answer in a latent variable model which can explicitly represent latent linear dynamics of images, and combined it with LQR-FLM to provide the foundation for their SOLAR algorithm.
Berkeley researchers utilized a Sawyer robotic arm — which has seven degrees of freedom and performs various manipulation tasks — as their main testbed. They trained the robot with camera images of the arm and related objects in the scene. The robot was then tasked with learning Lego block stacking and mug pushing tasks, and comparison experiments were conducted with SOLAR, a standard variational auto-encoder (VAE) method, a model-predictive control (MPC) method, and deep visual foresight (DVF).
Results showed that DVF struggled to solve tasks at harder settings even when given more data than SOLAR. In the Lego test, the Sawyer robot arm under SOLAR quickly learned stacking with image observations from just three initial positions. It also learned to push the mug onto a coaster within one hour of training with sparse rewards. Again, deep visual foresight (DVF) did not learn as efficiently or as quickly as SOLAR.
Researchers also compared SOLAR with robust locally-linear controllable embeddings (RCE), which learn latent state representations following linear dynamics; and with proximal policy optimization (PPO), a model-free RL method used to solve simulated robotics domains. SOLAR learned faster and had better final performance than RCE; and although PPO achieved better final performance than SOLAR it also required one to three orders of magnitude more data, which is prohibitive for most real world learning tasks.
Researchers believe SOLAR can be improved to learn more complex, multi-stage tasks such as building Lego structures or leveraging advanced hardware such as dexterous hands more efficiently. Their goal is to increase SOLAR’s ability to deal with complex, real-world environments. Read more on the BAIR blog.
Author: Yuqing Li | Editor: Michael Sarazen