Significant progress has been made in recent years on learning techniques that enable robots to perform a variety of manipulation tasks with strong generalization capabilities to novel scenarios. This progress however relies heavily on large-scale datasets, which are challenging to build and scale as they typically require either human demonstrations or engineering-heavy autonomous data collection schemes.
In the new paper Scaling Robot Learning with Semantically Imagined Experience, a team from Robotics at Google and Google Research proposes Robot Learning with Semantically Imagined Experience (ROSIE), a general and semantically-aware data augmentation strategy that bypasses demanding data acquisition processes by leveraging text-to-image foundation models to generate data for robot learning.
Generative diffusion models can model complex distributions and have demonstrated tremendous abilities in text-to-image generation. While such models are better known for their performance on computer vision and natural language processing tasks, they can also be used for data augmentation.
Inspired by the capabilities of off-the-shelf text-guided diffusion models (whose priors are informed by massive real-world training data), the team explores how such models might improve robotic learning and generalization by generating semantically meaningful augmentations on top of existing robotic datasets to scale up training data.
The team’s approach first localizes an image’s augmentation region with an open vocabulary segmentation model and, based on this natural language prompt, generates a mask of the target region relevant to the language. Given the augmentation text, ROSIE then performs inpainting on the selected mask using Imagen Editor to add unseen but semantically accurate objects based on the augmented text instruction.
In their empirical study, the team evaluated ROSIE on various robot manipulation and embodied reasoning tasks. The results confirm that ROSIE’s data augmentation boosts learned models’ generalization abilities to unseen tasks with new objects and improves their robustness to distractors and backgrounds.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.