AI Machine Learning & Data Science Research

DeepMind’s Fictitious Co-Play Trains RL Agents to Collaborate with Novel Humans Without Using Human Data

A DeepMind research team explores the problem of how to train agents to collaborate well with novel human partners without using human data and presents Fictitious Co-Play (FCP), a surprisingly simple approach designed to address this challenge.

Generating AI agents that can collaborate effectively with humans is a long-standing challenge in the broad field of human-machine interaction. Recent studies have shown that it is possible for agents to reach human-level performance using model-free reinforcement learning (RL) without human data via self-play, where an agent learns from repeated games played against copies and variants of itself. A question naturally arises: Can such techniques progress to the point where they produce agents that are able to collaborate effectively with a wide range of novel human partners?

In the new paper Collaborating With Humans Without Human Data, A DeepMind research team explores the problem of how to train agents that collaborate effectively with diverse human partners without using human data. The team proposes Fictitious Co-Play (FCP), a surprisingly simple approach designed to address this challenge.

The researchers summarize the main contributions of their study as:

  1. We propose Fictitious Co-Play (FCP) to train agents capable of zero-shot coordination with humans.
  2. We demonstrate that FCP agents generalize better than SP (self-play), PP (population play), and BCP (behavioural cloning play) in zero-shot coordination with a variety of held-out agents.
  3. We propose a rigorous human-agent interaction study with behavioural analysis and participant feedback.
  4. We demonstrate that FCP significantly outperforms the BCP state-of-the-art, both in task score and in human partner preference.

There are two principal challenges in developing AI agents for collaboration with novel partners: 1) Dealing with symmetries, which involves miscoordinations in zero-shot settings when agents have no good way to break the symmetries. This can be evidenced for example in a scenario familiar to humans approaching each other on a sidewalk, where agents A and B are on a collision course and can bypass each other by moving to either their left or right. Although both choices are valid solutions, a good agent is expected to observe and adapt to the human’s direction preference and move to the opposite direction. 2) Dealing with variations in skill level, where a good agent should be able to assist and collaborate with both highly-skilled partners as well as partners who are still learning.

Fictitious co-play (FCP) is a simple two-stage approach designed to address the aforementioned challenges. In the first stage, the team trains a diverse pool of agents (partners) to represent different symmetry-breaking conventions. These partners are trained independently, so each arrives at different arbitrary conventions for breaking symmetries. To enable the pool to represent different skill levels, the team employs multiple checkpoints for each self-play partner. The final checkpoint represents a fully-trained “skillful” partner, while the earlier checkpoints represent “less skillful” partners.

In the second stage, the team trains an FCP agent to action the best response to the given pool of diverse partners from the first stage. The partner parameters are frozen, forcing the FCP to learn to adapt to partners rather than expecting partners to adapt to it.

The team compared FCP agents to three baseline training methods: 1) Self-play (SP), where agents learn solely through interaction with themselves; 2) Population-play (PP), where a population of agents are co-trained through random pairings; and 3) Behavioural cloning play (BCP), where an agent is trained with a BC model of a human. The testing environment was Overcooked, a two-player common-payoff game in which players must coordinate to cook and deliver soup.

The experiments yielded the following results and conclusions:

  1. FCP significantly outperformed all baselines.
  2. Training with past checkpoints is the most beneficial variation for performance.
  3. FCP coordinates best with humans, achieving the highest score across maps.
  4. Participants prefer FCP over all baselines.
  5. FCP exhibits the best movement coordination with humans.
  6. FCP’s preferences over cooking pots align best with that of humans

Overall, the study shows that FCP agents achieve significantly higher performance than all baselines when partnered with both novel agents and human partners; and that humans reported a strong subjective preference to partnering with FCP agents over the baselines. The team believes their work establishes a strong foundation for future research on human-agent collaboration.

The paper Collaborating With Humans Without Human Data is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “DeepMind’s Fictitious Co-Play Trains RL Agents to Collaborate with Novel Humans Without Using Human Data

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: