Researchers from Stanford University and University of California Berkeley have introduced Gibson Environment, a real-world-based virtual environment for training and testing active perception agents.
It is virtually impossible for developers to train and test visual perception models in the real world: either the algorithms are not fast enough to learn in real-time, or the robots required are prohibitively pricey or fragile. Using Gibson, an AI can explore its environment and take actions accordingly — all without breaking anything.
A major challenge for active agents learning in simulation is transfering results to the real world. To help smoothen the transition, the Gibson Environment was built by virtualizing real-world spaces instead of using artificially designed ones.
Gibson Environment is named after James J. Gibson, author of the influential 1979 book The Ecological Approach to Visual Perception, who wrote “We must perceive in order to move, but we must also move in order to perceive.”
Agents can receive visual observations and perform physical tasks in the rendering environment. Agent mobility is limited to the given space and subject to the laws of physics. The agent can be a car or humanoid, and the visual perception view will appear as if an onboard camera.
Researchers proposed three steps for transfering trained models to the real world. First, present the world without deducting semantic complexity, and based on a scanned real environment instead of an artificial one. Second, close the gap between Gibson’s renderings and captures from a real camera. Finally, demonstrate the agent’s ability to learn to complete perceptual tasks such as avoiding blocks, navigation, and climbing stairs.
The model improves by closing the gaps between its renderings and the real world (Gibson) data utilizing a neural network by rendering approach.
There are two distinct directions for processing information in the trained neural network: one is to make the renderings similar to the real environment (forward function), the other is to enable the actual surroundings to look like renderings (backward function). Combining the features allows the network to produce equal output bridging the two domains.
Researchers built a dataset of 572 full buildings with 1447 floors covering 211 square kilometers for Gibson. A neural network-based view synthesis module and physics engine were also included in the Gibson architecture.
Additional project info and the paper Gibson Env: Real-World Perception for Embodied Agents are available at stanford.edu.
Journalist: Fangyu Cai | Editor: Michael Sarazen