AAAI-17 Outstanding Paper Award: Label-Free Supervision of Neural Networks with Physics and Domain Knowledge, by Russell Stewart and Stefano Ermon
The AAAI Outstanding Paper Awards honors papers that exemplify the highest standards in technical contribution and exposition. This award is usually presented to researchers who achieve both breadth and uniqueness within Computer Science and often bridge different departments and disciplines.
The recipient of this year’s Outstanding Paper Award, “Label-Free Supervision of Neural Networks with Physics and Domain Knowledge”, was inspired by human’s learning process. It’s a process which utilizes prior domain knowledge to constrain output space to a specific learning structure rather than a simple mapping from input to output. By doing that, this paper avoids using a large amount of labeled data to supervise the Neural Network, but enforce Neural Network to learn more advanced structures. Contemporary methods for learning without labels are known as unsupervised learning, take the autoencoder for exmaple. These methods often cluster different groups of input data. Although they can be effective, in general, they are lacking in meaningful interpretations. In contrast, unsupervised learning, by training without explicit labels but the ground truth laws, we can benefited from a reduction in the amount of work spent labeling, and an increase in generality, as a single set of constraints can be applied to multiple data sets without relabeling.
There are two general ways to incorporate prior knowledge into supervised learning:
- By restricting the space of possible functions specifying the hypothesis class F
- By adding an a-prior preference for certain functions in 𝔽, and the corresponding regularization ℝ(f), where f ∈ 𝔽.
In this paper, the authors focus on finding the constraint function g: 𝕏 × 𝕐 ⟶ ℝ according to prior knowledge, for example physics law, to penalize the learning structure when the learning process deviates from the prior knowledge. To use abstract high-leveling thinking rather than mere labels. In this training scenario, the label y is only for evaluating, and it is not necessary for discovering the optimal rule f* ∈ 𝔽. To make sure it converges to find the correct f*, we may also need to add additional regularization terms to supervise machine. This process of designing constraint function g and regularization terms is exactly a form of supervision.
This paper provides three examples:
- Tracking an object in free fall
- Tracking the position of a walking man
- Detecting objects with causal relationships.
The first example follows the simplest physics law: an ordinary object falling without any forces other than gravity. Thus, when the object is thrown away, its trajectory is undoubtedly a parabola. By using this prior knowledge from law of physics, we can design a constraint function to force the Neural Network to converge to it. During training, the data set consists of 65 different trajectories, totaling 602 images. With an Adam optimizer and a learning rate of 0.0001, the CNN is trained for 4000 iterations. The result was surprisingly good. The trained Neural Network results in a correlation of 90.1%, compared to a correlation of 94.5% of a trained Neural Network on labels. Without labeling, Neural Network can still achieve excellent outputs.
The second example is similar to previous one, but its constraint function needs to be revised due to its convergence problem. In this scenario, the constant velocity assumption approximately holds. Both experiments relate to Equations of Motion, the only difference being the gravity term disappears in the second experiment. The problem here is that the network can always converge to a constant C if we do not explicitly guard against this trivial solution. So we need to devise regularization terms to help constraint function g randomize its network. Consequently, a counterbalance helper function is needed. During training, the data set consists of 11 different trajectories across 6 different scenes, totaling 507 images. And the same hyper-parameters in the first experiment were maintained for demonstrating the robustness of parameters. Final results are 95.4% correlated with ground truth. Moreover, this model can still achieve 80.5% on the test set (99.8% on training). The authors attribute this decreased performance to over-fitting on the small amount of training data (11 trajectories) and would expect a near perfect correlation for a well trained supervised classifier.
The third example is more general, not from real world phenomenons but from logical and constraint based formalism. Thus, in this third experiment, we explore the possibilities of learning from logical constraints imposed on single images. The data set consists of random collection of Nintendo characters — Mario, Peach, Yoshi and Bowser with each character having small appearance changes across frames due to rotation and reflection. The generated distribution encodes the underlying coincidence that Mario shows up whenever Peach appears (same as the game itself — save Peach!). The task of Neural Network is to identify Peach and Mario, respectively. Rather than supervising with direct labels, we train the net- works by constraining their outputs to have the logical relationship y1 ⇒ y2, where y1 represents Peach and y2 represents Mario. In this settings, we can always predict y1 === 1 and y2 === 1 since this rule is always right. Without any penalty and regularization, the output could end up deterministic and meaningless (In ROC curve, this situation corresponds to Probability of False Detection equals to one and Probability of Detection equals to 1 simultaneously, which is meaningless). To avoid this trivial solution, we need more intricate regularization terms to encourage network to focus on the existence of objects rather than location. On a test set of 128 images, the network learns to map each image to a correct description of whether the image contains Peach and Mario.
Constraint learning is a generalization of supervised learning that allows for more creative methods of supervision. And this new approach learns by leveraging the representation learning abilities of modern neural networks, and adding sufficiency terms when the primary constraint is merely necessary. Future challenges include extending these results to larger data sets with multiple objects per image, and simplifying the process of picking sufficiency terms for new and interesting problems. By freeing the operator from collecting labels, our small scale experiments show promise for the future of training neural networks with weak supervision.
Convexification of learning from constraints.
A method for stochastic optimization.
Building high-level features using large scale unsupervised learning. https://static.googleusercontent.com/media/research.google.com/zh-CN//archive/unsupervised_icml2012.pdf
Analyst: Arac Wu| Localized by Synced Global Team : Xiang Chen