From YouTube to Keys: Transforming Internet Data into Robotic Musical Talent

Synced

2 years ago

The Internet holds immense potential as a vast source of data for training versatile robotic agents. However, harnessing this data effectively is challenging. Previous research has shown success in teaching robots manipulation skills through observations, but these methods have often been constrained by the robots’ limited dexterity or the narrow range of tasks they could perform.

In a new paper PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations, a research team from TU Munich, TU Darmstadt and UC Berkeley introduces PianoMime, a framework for training a robot to play the piano using internet-sourced demonstrations. The task is notably complex, requiring high dexterity, as the agent must use two multi-fingered robot hands to press the correct keys at the appropriate times for a given song.

The primary contributions of this research include:

Developing a method to learn policies from internet demonstrations by separating human motion data from task-specific information.
Presenting a reinforcement learning approach that integrates residual policy learning with style reward-based strategies.
Exploring various policy architecture designs, including new strategies for learning geometrically consistent latent features and conducting ablation studies on different designs.

At its core, the PianoMime agent is a goal-conditioned policy that generates actions based on the desired song. For each timestep, the agent receives a trajectory of the keys to be pressed and generates a corresponding action trajectory to execute.

The PianoMime framework consists of three main phases:

Data Preparation: Extracting informative features from YouTube videos.
Policy Learning: Training song-specific expert policies from the demonstrations.
Policy Distillation: Combining the expert policies into a single, generalist agent.

To train the agent, the researchers used a combination of reinforcement learning and imitation learning. They developed song-specific expert policies using reinforcement learning with YouTube demonstrations and then distilled these policies into a generalist behavioral cloning policy.

The team experimented with different architectural design strategies to model the behavioral cloning policy. They also investigated the effectiveness of a hierarchical policy that combines a high-level policy for generating fingertip trajectories with a learned cross-domain inverse dynamics model for generating joint-space actions.

Empirical results show that the PianoMime agent can play new songs not included in the training set with an F1-score of approximately 56%. The generalization capability of the policy is notable, achieving an average F1-score of 70% on unseen songs. This demonstrates the significant potential of using internet data for training generalist robotic agents.

Project website: https://pianomime.github.io/. The paper PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations is on arXiv.

Author: Hecate He | Editor: Chain Zhang

Share this: