AI Machine Learning & Data Science Research

Google’s ROSIE Data Augmentation Strategy Scales Robot Learning With Semantically Imagined Experience

In a new paper Scaling Robot Learning with Semantically Imagined Experience, a team from Robotics at Google and Google Research Robot proposes Learning with Semantically Imagined Experience (ROSIE), a general and semantically-aware data augmentation strategy that leverages text-to-image models to obtain data for robot learning.

Significant progress has been made in recent years on learning techniques that enable robots to perform a variety of manipulation tasks with strong generalization capabilities to novel scenarios. This progress however relies heavily on large-scale datasets, which are challenging to build and scale as they typically require either human demonstrations or engineering-heavy autonomous data collection schemes.

In the new paper Scaling Robot Learning with Semantically Imagined Experience, a team from Robotics at Google and Google Research proposes Robot Learning with Semantically Imagined Experience (ROSIE), a general and semantically-aware data augmentation strategy that bypasses demanding data acquisition processes by leveraging text-to-image foundation models to generate data for robot learning.

Generative diffusion models can model complex distributions and have demonstrated tremendous abilities in text-to-image generation. While such models are better known for their performance on computer vision and natural language processing tasks, they can also be used for data augmentation.

Inspired by the capabilities of off-the-shelf text-guided diffusion models (whose priors are informed by massive real-world training data), the team explores how such models might improve robotic learning and generalization by generating semantically meaningful augmentations on top of existing robotic datasets to scale up training data.

The team’s approach first localizes an image’s augmentation region with an open vocabulary segmentation model and, based on this natural language prompt, generates a mask of the target region relevant to the language. Given the augmentation text, ROSIE then performs inpainting on the selected mask using Imagen Editor to add unseen but semantically accurate objects based on the augmented text instruction.

In their empirical study, the team evaluated ROSIE on various robot manipulation and embodied reasoning tasks. The results confirm that ROSIE’s data augmentation boosts learned models’ generalization abilities to unseen tasks with new objects and improves their robustness to distractors and backgrounds.

The project website is here: https://diffusion-rosie.github.io. The paper Scaling Robot Learning with Semantically Imagined Experience is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

1 comment on “Google’s ROSIE Data Augmentation Strategy Scales Robot Learning With Semantically Imagined Experience

  1. well said
    In a recent study, Google researchers propose a new data augmentation strategy that bypasses demanding data acquisition processes by leveraging text-to-image foundation models to generate data for robot learning. This new strategy, called ROSIE, is based on the capabilities of off-the-shelf text-guided diffusion models and is designed to improve robotic learning and generalization by generating semantically meaningful augmentations on top of existing robotic datasets. The results of the study, which were published in the journal Robotics, demonstrate that ROSIE’s data augmentation strategies are effective in boosting learned models’ generalization abilities to unseen tasks with new objects and improve their robustness to distractors and backgrounds.
    Chris

Leave a Reply

Your email address will not be published. Required fields are marked *

%d