Apple Repurposes Large Language Models for Reinforcement Learning challenges in Embodied AI

Large Language Models (LLMs) have ushered in an era of unparalleled language understanding capabilities, raising the possibility of harnessing their prowess for complex embodied visual tasks. This new frontier explores whether these models can be the cornerstone of adaptable, generalizable policies for decision-making that seamlessly transfer to novel scenarios.

In a new paper Large Language Models as Generalizable Policies for Embodied Tasks, an Apple research team presents Large LAnguage model Reinforcement Learning Policy (LLaRP). LLaRP effectively repurposes LLMs for Reinforcement Learning (RL) challenges within the realm of Embodied Artificial Intelligence (AI), achieving a remarkable 1.7 times higher success rate compared to other established baselines and zero-shot LLM applications.

The LLaRP approach is a pioneering effort in adapting pre-trained LLMs to navigate multi-modal decision-making settings inherent to embodied tasks. The core of the problem is cast as a Partially-Observable Markov Decision Process (POMDP), wherein the policy’s inputs encompass task instructions and egocentric visual RGB frames from the current time step. These inputs are encoded using LLM embeddings or a vision encoder. These embeddings serve as the input to a pre-trained LLM, and the hidden outputs are subsequently projected to action and value predictions. Notably, the entire system learns through online RL, with the action output module and observation encoder MLP being the only trainable components while the others remain frozen.

The research team demonstrates that using a pre-trained and frozen LLM as a Vision-Language Model (VLM) policy with learned input and output adapter layers results in a policy showcasing robust generalization capabilities. This policy is trained using online RL, and its generalization is assessed along two axes: Paraphrastic Robustness (PR) and Behavior Generalization (BG).

LLaRP undergoes rigorous evaluation across over 1,000 unseen tasks, spanning the axes of PR and BG, and achieves an impressive 42% success rate. This surpasses the performance of alternative LSTM-based policies at 25% and zero-shot LLM applications at 22%. Importantly, LLaRP outperforms all baselines when given novel instructions and when assigned previously unseen tasks. Moreover, the researchers demonstrate that the LLaRP LLM-based policy provides a significant performance boost in a distinct domain, Atari, compared to a Transformer baseline.

The research team further uncovers the benefits of infusing LLM-encoded world knowledge into RL. LLM-based models exhibit superior sample efficiency compared to other conventional architectures in both basic Proximal Policy Optimization (PPO) RL and continual learning settings. Furthermore, LLaRP proves to be more efficient in terms of required supervision when contrasted with commonly used imitation learning techniques.

In a promising initiative to facilitate further exploration of generalization in Embodied AI, the researchers introduce the Language Rearrangement task. This task involves a staggering 150,000 distinct language instructions, each equipped with automatically generated rewards, providing a valuable framework for ongoing research in the field.

In conclusion, this pioneering research paper exemplifies the transformative potential of integrating LLMs into embodied tasks. The LLaRP approach not only excels in achieving high success rates but also significantly enhances efficiency, opening up exciting possibilities for the future of Embodied AI research and development.

Video examples of LLaRP in unseen Language Rearrangement instructions are at https://llm-rl.github.io. The paper Large Language Models as Generalizable Policies for Embodied Tasks on arXiv.

Author: Hecate He | Editor: Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

4 comments on “Apple Repurposes Large Language Models for Reinforcement Learning challenges in Embodied AI”

Henry Larry

2023-11-24

Fascinating approach by Apple’s team! Leveraging LLMs for RL in Embodied AI showcases promising advancements, achieving an impressive success rate—truly innovative work!
Airport Shuttle Services in Englewood FL

Loading...

annatt46

2024-02-06

Large language models, such as GPT (Generative Pre-trained Transformer) models, have shown versatility in understanding and connections generating human-like text across diverse domains.

Loading...

SDS

2025-05-05

Transform Your Smile with a smile makeover dentist in Mona Vale
Your smile is one of the first things people notice, and if you’re not feeling confident about it, it may be time for a consultation with a smile makeover dentist. At Mona Vale Dental, the smile makeover process is tailored specifically to you. Using advanced cosmetic techniques such as veneers, whitening, and reshaping, they craft a natural-looking transformation that enhances both function and aesthetics. It’s not just about beauty—it’s about boosting your self-esteem and quality of life.

Loading...

jack in the box

2026-05-18

Really interesting research on how Apple is adapting large language models for embodied AI and reinforcement learning tasks. The progress in combining language understanding with real-world decision-making could shape the future of robotics and smart assistants. I also came across this helpful jack in the box guide recently for checking quick meal options and updated menu details during late-night reading sessions.

Loading...

Apple Repurposes Large Language Models for Reinforcement Learning challenges in Embodied AI

Like this:

4 comments on “Apple Repurposes Large Language Models for Reinforcement Learning challenges in Embodied AI”

Leave a Reply Cancel reply

Related

Share this:

Like this:

4 comments on “Apple Repurposes Large Language Models for Reinforcement Learning challenges in Embodied AI”

Leave a Reply Cancel reply

Related