Site icon Synced

From YouTube to Keys: Transforming Internet Data into Robotic Musical Talent

The Internet holds immense potential as a vast source of data for training versatile robotic agents. However, harnessing this data effectively is challenging. Previous research has shown success in teaching robots manipulation skills through observations, but these methods have often been constrained by the robots’ limited dexterity or the narrow range of tasks they could perform.

In a new paper PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations, a research team from TU Munich, TU Darmstadt and UC Berkeley introduces PianoMime, a framework for training a robot to play the piano using internet-sourced demonstrations. The task is notably complex, requiring high dexterity, as the agent must use two multi-fingered robot hands to press the correct keys at the appropriate times for a given song.

The primary contributions of this research include:

At its core, the PianoMime agent is a goal-conditioned policy that generates actions based on the desired song. For each timestep, the agent receives a trajectory of the keys to be pressed and generates a corresponding action trajectory to execute.

The PianoMime framework consists of three main phases:

To train the agent, the researchers used a combination of reinforcement learning and imitation learning. They developed song-specific expert policies using reinforcement learning with YouTube demonstrations and then distilled these policies into a generalist behavioral cloning policy.

The team experimented with different architectural design strategies to model the behavioral cloning policy. They also investigated the effectiveness of a hierarchical policy that combines a high-level policy for generating fingertip trajectories with a learned cross-domain inverse dynamics model for generating joint-space actions.

Empirical results show that the PianoMime agent can play new songs not included in the training set with an F1-score of approximately 56%. The generalization capability of the policy is notable, achieving an average F1-score of 70% on unseen songs. This demonstrates the significant potential of using internet data for training generalist robotic agents.

Project website: https://pianomime.github.io/. The paper PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations is on arXiv.


Author: Hecate He | Editor: Chain Zhang

Exit mobile version