In a new paper AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning, a DeepMind research team presents AlphaStar Unplugged, an unprecedented challenging large-scale offline reinforcement learning benchmark that leverages a offline dataset from StarCraft II for RL agents training.
In a new paper Bigger, Better, Faster: Human-level Atari with human-level efficiency, a research team from Google DeepMind, Mila and Universite de Montreal presents a value-based RL agent, which they call faster, better, faster (BBF), that achieves super-human performance on the Atari 100K benchmark on single GPU.
In the new paper Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets, an Oxford University research team introduces Deep Duelling Double Q-learning with the APEX architecture to train a trading agent to translate predictive signals into optimal limit order trading strategies.
In the new paper Adversarial Policies Beat Professional-Level Go AIs, a research team from MIT, UC Berkeley, and FAR AI employs a novel adversarial policy to attack the state-of-the-art AI Go system KataGo. The team believes theirs is the first successful end-to-end attack against an AI Go system playing at the level of a human professional.
In the new paper Human-level Atari 200x Faster, a DeepMind research team applies a set of diverse strategies to Agent57, with their resulting MEME (Efficient Memory-based Exploration) agent surpassing the human baseline on all 57 Atari games in just 390 million frames — two orders of magnitude faster than Agent57.
The new DeepMind paper Data Augmentation for Efficient Learning from Parametric Experts proposes Augmented Policy Cloning (APC), a simple yet effective data-augmentation approach designed to support data-efficient learning from parametric experts. The method significantly improves data efficiency across various control and reinforcement learning settings.
In the new paper MO2: Model-Based Offline Options, a DeepMind research team introduces Model-Based Offline Options (MO2), an offline hindsight bottleneck options framework that supports sample-efficient option discovery over continuous state-action spaces for efficient skill transfer to new tasks.
In the new paper Planning in Stochastic Environments with a Learned Model, a research team from DeepMind and University College London extends the deterministic MuZero model to Stochastic MuZero for stochastic model learning, achieving performance comparable or superior to state-of-the-art methods in complex single- and multi-agent environments.
In the new paper DayDreamer: World Models for Physical Robot Learning, researchers from the University of California, Berkeley leverage recent advances in the Dreamer world model to enable online reinforcement learning for robot training without simulators or demonstrations, establishing a strong baseline for efficient real-world robotic learning.
In the new paper Large-Scale Retrieval for Reinforcement Learning, a DeepMind research team dramatically expands the information accessible to reinforcement learning (RL) agents, enabling them to attend to tens of millions of information pieces, incorporate new information without retraining, and learn decision making in an end-to-end manner.
In the new paper Factory: Fast Contact for Robotic Assembly, a research team from NVIDIA and the University of Washington introduces Factory, a set of physics simulation methods and robot learning tools for simulating contact-rich interactions in assembly with high accuracy, efficiency, and robustness.
In the new paper Rethinking Reinforcement Learning Based Logic Synthesis, a research team from Huawei Noah’s Ark Lab develops a novel reinforcement learning-based logic synthesis method to automatically recognize critical operators and produce common operator sequences that are generalizable to unseen circuits.
In the new paper AutoDIME: Automatic Design of Interesting Multi-Agent Environments, an OpenAI research team explores automatic environment design for multi-agent environments using an RL-trained teacher that samples environments to maximize student learning. The work demonstrates that intrinsic teacher rewards are a promising approach for automating both single and multi-agent environment design.
An OpenAI research team leverages reinforcement learning from human feedback (RLHF) to make significant progress on aligning language models with the users’ intentions. The proposed InstructGPT models are better at following instructions than GPT-3 while also more truthful and less toxic.
DeepMind researchers introduce Player of Games (PoG), a general-purpose algorithm that applies self-play learning, search, and game-theoretic reasoning to perfect and imperfect information games, taking an important step toward truly general algorithms for arbitrary environments.
In the new paper Understanding the World Through Action, UC Berkeley assistant professor in the department of electrical engineering and computer sciences Sergey Levine argues that a general, principled, and powerful framework for utilizing unlabelled data can be derived from reinforcement learning to enable machine learning systems leveraging large datasets to understand the real world.
DeepMind and Google Brain researchers and former World Chess Champion Vladimir Kramnik explore how human knowledge is acquired and how chess concepts are represented in the AlphaZero neural network via concept probing, behavioural analysis, and an examination of its activations.
In the paper Introducing Symmetries to Black Box Meta Reinforcement Learning, a research team from DeepMind and The Swiss AI Lab IDSIA explores the role of symmetries in meta generalization and shows that introducing more symmetries to black-box meta-learners can improve their ability to generalize to unseen action and observation spaces, tasks, and environments.
A research team from Carnegie Mellon University, Google Brain and UC Berkeley proposes a robust predictable control (RPC) method for learning reinforcement learning policies that use fewer bits of information. This simple and theoretically-justified algorithm achieves much tighter compression, is more robust, and generalizes better than prior methods, achieving up to 5× higher rewards than a standard information bottleneck.
In the paper ReGen: Reinforcement Learning for Text and Knowledge Base Generation Using Pretrained Language Models, IBM researchers present ReGen, a bidirectional generation of text and graph that leverages reinforcement learning to push the performance of text-to-graph and graph-to-text generation tasks to a higher level.
A research team from Stanford University introduces BEHAVIOR, a benchmark for embodied AI with 100 realistic, diverse and complex everyday household activities in simulation. BEHAVIOR addresses challenges such as definition, instantiation in a simulator, and evaluation; and pushes the state-of-the-art by adding new types of state changes.
A research team from Mila, McGill University, Université de Montréal, DeepMind and Microsoft proposes GFlowNet, a novel flow network-based generative method that can turn a given positive reward into a generative policy that samples with a probability proportional to the return.v
A research team from McGill University, Université de Montréal, DeepMind and Mila presents an end-to-end, model-based deep reinforcement learning (RL) agent that dynamically attends to relevant parts of its environments to facilitate out-of-distribution (OOD) and systematic generalization.
A research team from UC Berkeley, Facebook AI Research and Google Brain abstracts Reinforcement Learning (RL) as a sequence modelling problem. Their proposed Decision Transformer simply outputs optimal actions by leveraging a causally masked transformer, yet matches or exceeds state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
A research team from Google Brain conducts a comprehensive empirical study on more than fifty choices in a generic adversarial imitation learning framework and explores their impacts on large-scale (>500k trained agents) continuous-control tasks to provide practical insights and recommendations for designing novel and effective AIL algorithms.
A research team from the University of Montreal and Max Planck Institute for Intelligent Systems constructs a reinforcement learning agent whose knowledge and reward function can be reused across tasks, along with an attention mechanism that dynamically selects unchangeable knowledge pieces to enable out-of-distribution adaptation and generalization.
A research team from MIT and MIT-IBM Watson AI Lab proposes Curious Representation Learning (CRL), a framework that learns to understand the surrounding environment by training a reinforcement learning (RL) agent to maximize the error of a representation learner to gain an incentive to explore the environment.
Stanford researchers’ DERL (Deep Evolutionary Reinforcement Learning) is a novel computational framework that enables AI agents to evolve morphologies and learn challenging locomotion and manipulation tasks in complex environments using only low level egocentric sensory information.