In the new paper Large-Scale Retrieval for Reinforcement Learning, a DeepMind research team dramatically expands the information accessible to reinforcement learning (RL) agents, enabling them to attend to tens of millions of information pieces, incorporate new information without retraining, and learn decision making in an end-to-end manner.
In the new paper Factory: Fast Contact for Robotic Assembly, a research team from NVIDIA and the University of Washington introduces Factory, a set of physics simulation methods and robot learning tools for simulating contact-rich interactions in assembly with high accuracy, efficiency, and robustness.
In the new paper Rethinking Reinforcement Learning Based Logic Synthesis, a research team from Huawei Noah’s Ark Lab develops a novel reinforcement learning-based logic synthesis method to automatically recognize critical operators and produce common operator sequences that are generalizable to unseen circuits.
In the new paper AutoDIME: Automatic Design of Interesting Multi-Agent Environments, an OpenAI research team explores automatic environment design for multi-agent environments using an RL-trained teacher that samples environments to maximize student learning. The work demonstrates that intrinsic teacher rewards are a promising approach for automating both single and multi-agent environment design.
An OpenAI research team leverages reinforcement learning from human feedback (RLHF) to make significant progress on aligning language models with the users’ intentions. The proposed InstructGPT models are better at following instructions than GPT-3 while also more truthful and less toxic.
DeepMind researchers introduce Player of Games (PoG), a general-purpose algorithm that applies self-play learning, search, and game-theoretic reasoning to perfect and imperfect information games, taking an important step toward truly general algorithms for arbitrary environments.
In the new paper Understanding the World Through Action, UC Berkeley assistant professor in the department of electrical engineering and computer sciences Sergey Levine argues that a general, principled, and powerful framework for utilizing unlabelled data can be derived from reinforcement learning to enable machine learning systems leveraging large datasets to understand the real world.
DeepMind and Google Brain researchers and former World Chess Champion Vladimir Kramnik explore how human knowledge is acquired and how chess concepts are represented in the AlphaZero neural network via concept probing, behavioural analysis, and an examination of its activations.
In the paper Introducing Symmetries to Black Box Meta Reinforcement Learning, a research team from DeepMind and The Swiss AI Lab IDSIA explores the role of symmetries in meta generalization and shows that introducing more symmetries to black-box meta-learners can improve their ability to generalize to unseen action and observation spaces, tasks, and environments.
A research team from Carnegie Mellon University, Google Brain and UC Berkeley proposes a robust predictable control (RPC) method for learning reinforcement learning policies that use fewer bits of information. This simple and theoretically-justified algorithm achieves much tighter compression, is more robust, and generalizes better than prior methods, achieving up to 5× higher rewards than a standard information bottleneck.
In the paper ReGen: Reinforcement Learning for Text and Knowledge Base Generation Using Pretrained Language Models, IBM researchers present ReGen, a bidirectional generation of text and graph that leverages reinforcement learning to push the performance of text-to-graph and graph-to-text generation tasks to a higher level.
A research team from Stanford University introduces BEHAVIOR, a benchmark for embodied AI with 100 realistic, diverse and complex everyday household activities in simulation. BEHAVIOR addresses challenges such as definition, instantiation in a simulator, and evaluation; and pushes the state-of-the-art by adding new types of state changes.
A research team from Mila, McGill University, Université de Montréal, DeepMind and Microsoft proposes GFlowNet, a novel flow network-based generative method that can turn a given positive reward into a generative policy that samples with a probability proportional to the return.v
A research team from McGill University, Université de Montréal, DeepMind and Mila presents an end-to-end, model-based deep reinforcement learning (RL) agent that dynamically attends to relevant parts of its environments to facilitate out-of-distribution (OOD) and systematic generalization.
A research team from UC Berkeley, Facebook AI Research and Google Brain abstracts Reinforcement Learning (RL) as a sequence modelling problem. Their proposed Decision Transformer simply outputs optimal actions by leveraging a causally masked transformer, yet matches or exceeds state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
A research team from Google Brain conducts a comprehensive empirical study on more than fifty choices in a generic adversarial imitation learning framework and explores their impacts on large-scale (>500k trained agents) continuous-control tasks to provide practical insights and recommendations for designing novel and effective AIL algorithms.
A research team from the University of Montreal and Max Planck Institute for Intelligent Systems constructs a reinforcement learning agent whose knowledge and reward function can be reused across tasks, along with an attention mechanism that dynamically selects unchangeable knowledge pieces to enable out-of-distribution adaptation and generalization.
A research team from MIT and MIT-IBM Watson AI Lab proposes Curious Representation Learning (CRL), a framework that learns to understand the surrounding environment by training a reinforcement learning (RL) agent to maximize the error of a representation learner to gain an incentive to explore the environment.
Stanford researchers’ DERL (Deep Evolutionary Reinforcement Learning) is a novel computational framework that enables AI agents to evolve morphologies and learn challenging locomotion and manipulation tasks in complex environments using only low level egocentric sensory information.