AI Machine Learning & Data Science Research

Oxford U & DeepMind Harness Cultural Accumulation in Reinforcement Learning

In a new paper Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning, a research team from the University of Oxford and Google DeepMind introduces methods to achieve cultural accumulation in Reinforcement Learning (RL) agents. This research opens new pathways for modeling human culture through artificial systems.

Cultural accumulation has been a driving force behind the open-ended and diverse advancements in human capabilities throughout history. By combining individual exploration with the inter-generational transmission of information, it builds an ever-expanding body of knowledge and skills. Considering the profound success of cultural accumulation in nature, exploring its applicability to artificial learning systems presents a promising yet under-researched direction.

In a new paper Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning, a research team from the University of Oxford and Google DeepMind introduces methods to achieve cultural accumulation in Reinforcement Learning (RL) agents. This research opens new pathways for modeling human culture through artificial systems.

The team highlights that the potential for RL agents to accumulate culture is largely untapped. Traditional RL approaches typically focus on improvements within a single lifetime. Existing generational algorithms fail to capture the open-ended, emergent nature of cultural accumulation, which allows for a balance between innovation and imitation.

Building on the established ability of RL agents to perform social learning, the researchers discovered that training setups balancing social and independent learning foster cultural accumulation. These accumulating agents outperform those trained solely for a single lifetime with an equivalent amount of cumulative experience.

The researchers present two formulations of cultural accumulation in RL: in-context accumulation, which pertains to fast adaptation to new environments, and in-weights accumulation, which involves the slower process of updating weights. The in-context setting is analogous to short-term knowledge accumulation, while the in-weights setting represents long-term, skills-based accumulation.

The effectiveness of both models is demonstrated through sustained generational performance gains on various tasks requiring exploration under partial observability. In each task, accumulating agents outperformed those learning within a single lifetime, even with the same total experience budget. Notably, this cultural accumulation emerged purely from individual agents maximizing their independent rewards without any additional losses.

To the researchers’ knowledge, this work is the first to present general models achieving emergent cultural accumulation in reinforcement learning. This breakthrough opens up new avenues for creating more open-ended learning systems and offers fresh opportunities for modeling human culture.

The paper Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning is on arXiv.


Author: Hecate He | Editor: Chain Zhang

0 comments on “Oxford U & DeepMind Harness Cultural Accumulation in Reinforcement Learning

Leave a Reply

Your email address will not be published. Required fields are marked *