DeepMind & UCL Propose Neural Population Learning: An Efficient and General Framework That Learns Strategically Diverse Policies for Real-World Games

A research team from DeepMind and University College London proposes Neural Population Learning (NeuPL), an efficient and general framework that learns and represents diverse policies in symmetric zero-sum games within a single conditional network.

by Synced

2022-02-17

Comments 8

Since the genesis of modern AI, researchers have regarded the challenges of real-world strategy games as a convenient testbed for model development. Improving performance in such games requires learning not a single strategy but rather a population of strategies, typically through iterative training. This approach however comes with two problematic issues: 1) Under a finite budget, approximate best-response operators often result in undertrained good-responses filling the population; 2) Repeated learning of basic skills at each iteration is wasteful and quickly becomes intractable when dealing with increasingly strong opponents.

In a new paper, a research team from DeepMind and University College London proposes Neural Population Learning (NeuPL), an efficient and general framework that learns and represents diverse policies in symmetric zero-sum games and enables transfer learning across policies within a single conditional network.

The researchers cite the popular game “rock-paper-scissors,” where a population with two available strategies (rock, paper) will beat a singleton population (scissors) if both populations are revealed. This is reflected in the unifying population learning framework Policy Space Response Oracle (PSRO, Lanctot et al., 2017), where a new policy is trained to best-respond to a mixture over previous policies at each iteration following a meta-strategy solver. A PSRO variation was used to master the game of StarCraft in 2019.

Such iterative and isolated approaches from classic game theory however are fundamentally different from how humans learn diverse strategies, where incremental strategic innovations can help us develop new strategies by revisiting and improving upon those we have already mastered. The proposed NeuPL framework aims to endow AI agents with similar capabilities by extending population learning to real-world games.

NeuPL was designed to satisfy two desiderata: 1) At convergence, the resulting population of policies should represent a sequence of iterative best-responses under reasonable conditions; 2) Transfer learning can occur across policies throughout training. This approach deviates from PSRO in several important ways:

NeuPL suggests concurrent and continued training of all unique policies such that no good-response features in the population prematurely due to early truncation.
NeuPL represents an entire population of policies via a shared conditional network with each policy conditioned on and optimized against a meta-game mixture strategy, enabling transfer learning across policies.
NeuPL allows for cyclic interaction graphs, beyond the scope of PSRO.

The researchers also note that NeuPL offers convergence guarantees to a population of best-responses under mild assumptions, has generality, can improve model performance and efficiency across domains, and that under the NeuPL framework, novel strategies become more accessible, not less, as the neural population expands.

To evaluate NeuPL’s effectiveness, the team conducted experiments using Maximum A Posterior Optimization (MPO, Abdolmaleki et al., 2018) as the underlying reinforcement learning (RL) algorithm across several domains.

The experiments validate NeuPL’s generality from two aspects: it recovers the expected results of existing population learning algorithms on rock-paper-scissors; and it generalizes to the spatiotemporal, partially observed strategy game of running-with-scissors (Vezhnevets et al., 2020), where players must infer opponent behaviours through tactical interactions. The results also show that NeuPL induces skill transfer across policies, enabling the discovery of exploiters to strong opponents that would have been out-of-reach otherwise; and that it scales to the large-scale Game-of-Skills of MuJoCo Football (Liu et al., 2019), where a concise sequence of best-responses are learned, reflecting the prominent transitive skill dimension of the game.

The team regards their study as a step toward scalable policy space exploration, and suggests going beyond the symmetric zero-sum setting as a possible direction for future research in this area.

The paper NeuPL: Neural Population Learning is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

8 comments on “DeepMind & UCL Propose Neural Population Learning: An Efficient and General Framework That Learns Strategically Diverse Policies for Real-World Games”

WowKaway

2023-06-07

I like to play also games, but World of Warcraft holds a special place in my heart. The immersive world of Azeroth and its captivating gameplay are truly remarkable. I also find some tips for boosting https://wowvendor.com/en-us/wow/ account! Firstly, focus on completing quests and dungeons to level up quickly. Joining a guild or forming groups can help in tackling challenging content. Additionally, stay updated with the latest patches and expansions to take advantage of new features and content.

Loading...

Reply
Grig Woods

2023-06-26

What do you think about sports betting?

Loading...

Reply
Kerry Smith

2023-06-29

I agree, sports betting is the best solution for gambling. And for successful bets, it is always necessary to carefully study the information about the bookmaker that you plan to use. I suggest you go to the site https://gobookmakers.com/1win-review-about-promo-code-and-bonuses/ and read the information about the 1win bookmaker. On this platform, you will be able to place bets without fear. In the article, you can also find promotional codes for receiving large bonuses.

Loading...

Reply
Miguel Vance

2025-11-12

G’day, a friend mentioned an online casino that gives extra perks specifically for players from Portuguese, so I decided to give it a try. I ended up on spin macho and started with a few slots I’d never played before. After a streak of small losses, I finally hit a big win on Starburst, which felt surprisingly rewarding. The bonuses tailored for Portuguese players really added to the excitement. I’d say it’s a fun way to unwind after a long day.

Loading...

Reply
alesxp

2025-12-20

The advancements in AI, especially with frameworks like Neural Population Learning, are fascinating and could revolutionize how we approach strategic games and decision-making. It’s interesting to see how these models can learn diverse strategies, similar to human learning. For those in Poland, leveraging such technologies could offer unique opportunities. If you’re looking to enhance your gaming experience or explore potential bonuses, check out the spinmama bonus. It’s a great way for Polish residents to benefit while engaging with these innovative strategies. The future of AI in gaming looks promising!

Loading...

Reply
Robert Keith

2026-01-11

The captivating arcade-style game space waves tests player’s quick reactions, deft control, and intense concentration. Despite its straightforward principles and minimalist sci-fi style, the game has a surprisingly high level of difficulty and replay value.

Loading...

Reply
Seth Sweet

2026-01-12

The well-known rhythm-based platformer geometry dash lite features quick action, easy controls, and difficult levels.

Loading...

Reply
strands

2026-03-31

Strands is a puzzle game that has become one of the most popular in recent years. Have you ever played it? I’ve tried it and really enjoyed the gameplay, and I hope you’ll love this amazing word game too.

Loading...

Reply