“It’s an energy field created by all living things, it surrounds us and penetrates us, it binds the galaxy together.” That’s how Jedi Master Obi-Wan Kenobi explains “The Force” to Luke Skywalker. Invisible yet powerful forces of nature are also of interest to today’s machine learning researchers, where understanding energy, forces and physics is critical to the development of real world AI applications.
The force is indeed everywhere. In physics, a force is any interaction that, when unopposed, will change the motion of an object. Humans need not think twice to figure out the physics in most human-object interactions, and we can also for example simply look at videos and imitate the actions. But this does not come so naturally for machines.
In a new study, researchers from Facebook Artificial Intelligence Research, the University of Washington, UIUC, and Carnegie Mellon University use a physics simulator to learn to predict physical forces in videos of humans interacting with objects.
The researchers say current recognition or geometric approaches lack the physicality of action representation, and propose their method to improve physical understanding of human-machine interactions: “While the goal of being able to infer these forces is desirable, it is unfortunately tedious (if not impossible) to acquire direct supervision for this task.” Supervision provides essential signals for labelling training data, but obtaining ground-truth labels for forces is a challenge that remains unsolved. Are there any alternatives for supervision? The team observed that a full geometric understanding could be recovered by simulating the effects of physical forces on objects, and used a physics simulator for supervision.
The goal of the simulation is to imitate motions observed in the interaction videos. Researchers input videos showing a human interacting with an object, and the model infers object motion, points of contact and forces as output. To learn how these interactions change object dynamics, researchers apply the forces in physics simulations, where the model learns how to optimize by minimizing the error in the projection of the object to the camera frame and accurately predicting contact points.
Object geometry can change throughout the interaction — for example when a hand bunches up a washcloth; while hand-to-object contact points can also be challenging to interpret for example when fingers strum guitar strings. To make the work tractable, the team restricted the setup of the study and assumed that the interactions were with a known rigid object and involved only a five-fingered hand applying the force.
To train the system, researchers collected a dataset of object manipulation videos showing various human participants grabbing and moving eight common objects: pitcher, bleach bottle, skillet, drill, hammer, toy airplane, soup can and mustard bottle. They also added annotations of object key points and contact points on each frame, and the 3D contact points of each interaction.
The team says the approach reveals meaningful forces from videos and their effects on predicted contact points, which enables accurate imitation of the motions in a physics simulation. Another valuable takeaway from the study is that contact point and force prediction are highly correlated, and that jointly optimizing improves the performance on both tasks.
Researchers believe the study is a significant step forward in bringing action and perception together in a common framework, and that the model’s prediction of physical forces could also help accelerate robotic imitation learning procedures.
The paper Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects is on arXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen