An Apple research team presents Large LAnguage model Reinforcement Learning Policy (LLaRP). LLaRP effectively repurposes LLMs for Reinforcement Learning (RL) challenges within the realm of Embodied Artificial Intelligence (AI), achieving a remarkable 1.7 times higher success rate compared to other established baselines and zero-shot LLM applications.
In a new paper Diversifying AI: Towards Creative Chess with AlphaZero, a Google DeepMind research team explores whether artificial intelligence can benefit from creative problem-solving mechanisms identified in human intelligence while pushing to the limits of its computational rationality.
In a new paper DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales, a Deepspeed of Microsoft research team presents DeepSpeed-Chat, a novel end-to-end RLHF pipeline that provides easy-to-use training and inference for ChatGPT-like models at scale.
In a new paper A Definition of Continual Reinforcement LearningA Definition of Continual Reinforcement Learning, a DeepMind research team rethinks RL problems as endless adaptation and provides a clean, general, precise mathematical definition of continual reinforcement learning (CRL), aiming to promote researches on CRL from a solid conceptual foundation.