AI Machine Learning & Data Science Research

Stanford’s BEHAVIOR Benchmarks 100 Activities From Everyday Life for Embodied AI

A research team from Stanford University introduces BEHAVIOR, a benchmark for embodied AI with 100 realistic, diverse and complex everyday household activities in simulation. BEHAVIOR addresses challenges such as definition, instantiation in a simulator, and evaluation; and pushes the state-of-the-art by adding new types of state changes.

In a bid to advance the growing field of embodied AI, Stanford University researchers have introduced BEHAVIOR, a benchmark comprising 100 realistic, diverse and complex everyday household activities in simulation. BEHAVIOR addresses challenges such as definition, instantiation in a simulator and evaluation; and pushes the state-of-the-art by adding new types of state changes such as cleaning surfaces or changing object temperatures.

Embodied AI refers to the study and development of agents that can perceive, reason, and interact with the environment with the capabilities and limitations of a physical body. Ideally, such robots can also generalize to new and changing tasks and environments. Significant progress has been made in this field in recent years, and BEHAVIOR (Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments) is designed to meet the need for advanced benchmarking of embodied AI in simulated environments.

Successful embodied AI benchmarks should accommodate more realistic, diverse, and complex activities, and this involves three main concepts: 1) Definition: identifying and defining meaningful activities for benchmarking; 2) Realization: developing simulated environments that realistically support such activities; and 3) Evaluation: defining success and objective metrics for evaluating performance.

To address these challenges, BEHAVIOR introduces three technical innovations:

  1. Domain Definition Language (BDDL): A representation adapted from predicate logic that maps simulated states to semantic symbols. BDDL enables the 100 activities to be defined as initial and goal conditions, as well as enabling the generation of potentially infinite initial states and methods for achieving the goal states.
  2. Realistic simulation: The team provides BEHAVIOR implementation in iGibson 2.0, generating potentially infinite diverse activity instances in realistic home scenes.
  3. A comprehensive set of metrics: The team proposes a set of metrics relative to demonstrated human performance on each activity, and provides a large-scale dataset of 500 human demonstrations in virtual reality.

Through the application of these technical innovations, BEHAVIOR reaches new levels of realism, diversity, and complexity in its 100 household activities (cleaning, packing, preparing food, etc.) with a new logic-symbolic representation, a fully functional simulation-based implementation, and a set of human-centric metrics based on the performance of humans on the same activities in VR.

To demonstrate the challenges imposed by the BEHAVIOR benchmarks, the team conducted a series of experiments, using state-of-the-art embodied AI solutions SoftActor Critic (SAC) and Proximal-Policy Optimization (PPO) on BEHAVIOR.

The researchers evaluated the impact of activity complexity (length of time) on robot learning performance. Even in the simplest conditions, agents failed in all but one activity (bringingInWood, Q = 0.13). In an oracle-driven test, even when starting 1 s away from a goal state, most learning agents failed to achieve the tasks, indicating that embodied AI solutions with a hierarchical structure may help overcome the challenges of high complexity (length) presented by the BEHAVIOR activities. The team also evaluated the effect of realism in sensing and actuation, noting that even with full observability, the complexity dominated policies in the original action space, and most learning agents failed to accomplish any part of the activities.

Finally, the team evaluated the effects of BEHAVIOR’s diversity in dimensions such as scenes, objects, and initial states. In these experiments, the performance in all activities decreased rapidly, indicating that BEHAVIOR’s diversity exceeds what current RL algorithms can handle even with regard to simplified activities.

Overall, the study sheds light on the various and relative challenges presented by embodied AI benchmarks in terms of realism, diversity and complexity. The Stanford Vision and Learning Lab will open-source BEHAVIOR in the hope that it can help advance research in this field and improve embodied AI performance.

The paper BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments is on arXiv.


Author: Hecate He | Editor: Michael Sarazen, Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

1 comment on “Stanford’s BEHAVIOR Benchmarks 100 Activities From Everyday Life for Embodied AI

  1. Pingback: r/artificial - [R] Stanford’s BEHAVIOR Benchmarks 100 Activities From Everyday Life for Embodied AI - Cyber Bharat

Leave a Reply

Your email address will not be published.

%d bloggers like this: